Data Analysis: MySQL Basics - Querying, Sorting & Filtering Data

To Lean basic MySQL statements

Posted by Mengran on August 21, 2021

MySQL Basics

More details can be found here:

MySQL Tutorial

SQL is case-insensitive, you can write the SQL statement in lowercase, uppercase, etc.

Example:

1
2
select select_list
from table_name;

Or:

1
2
SELECT select_list
FROM table_name;

Querying Data

SELECT FROM

1.SELECT FROM – query the data from a single table.

1
2
SELECT select_list
FROM table_name;

To query data from multiple columns:

1
2
3
4
5
6
SELECT 
    lastName, 
    firstName, 
    jobTitle
FROM
    employees;

To retrieve data from all columns:

  • Use the asterisk (*) which is the shorthand for all columns
1
2
SELECT * 
FROM employees;

2.SELECT – to use the SELECT statement without referencing a table.

MySQL does not require the FROM clause. It means that you can have a SELECT statement without the FROM clause

1
2
3
SELECT select_list;
SELECT NOW();
SELECT CONCAT('John',' ','Doe');

3.Assign an alias to a column to make it more readable.

To change a column name of the result set, you can use a column alias:

1
2
3
4
5
6
7
SELECT expression AS column_alias;

# or
SELECT expression column_alias;

# for example:
SELECT CONCAT('John',' ','Doe') AS name;

Sorting Data

ORDER BY

To sort the rows in the result set, you add the ORDER BY clause to the SELECT statement.

1.When executing the SELECT statement with an ORDER BY clause, MySQL always evaluates the ORDER BY clause after the FROM and SELECT clauses:

1
2
3
4
5
6
7
8
SELECT 
   select_list
FROM 
   table_name
ORDER BY 
   column1 [ASC|DESC], 
   column2 [ASC|DESC],
   ...;

By default, the ORDER BY clause uses ASC if you don’t explicitly specify any option.

Use ASC to sort the result set in ascending order:

1
ORDER BY column1 ASC;

Use DESC to sort the result set in descending order:

1
ORDER BY column1 DESC;

2.Sort the result set by **multiple columns:**

1
2
3
ORDER BY
   column1,
   column2;

3.Sort the result set by a column **in ascending order and then by another column in descending order:**

1
2
3
ORDER BY
    column1 ASC,
    column2 DESC;

4.Using ORDER BY clause to sort a result set by an expression:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT orderNumber,
       orderlinenumber,
	   quantityOrdered,
	   priceEach,
	   quantityOrdered * priceEach as totalPrice
FROM orderdetails
ORDER BY totalPrice;

>>>
orderNumber	orderlinenumber	quantityOrdered	priceEach	totalPrice
10419	                 7	          15	32.10	        481.50
10420	                 3	          15	35.29	        529.35
10322	                 3	          20	26.55	        531.00
10407	                 3	           6	91.11	        546.66

FIELD()

5.Using ORDER BY clause to sort data using a custom list

The FIELD() function returns the position of the str in the str1, str2, … list.

If the str is not in the list, the FIELD() function returns 0.

For example, the following query returns 1 because the position of the string ‘A’ is the first position on the list ‘A’, ‘B’, and ‘C’:

1
2
3
FIELD(str, str1, str2, ...)

SELECT FIELD('A', 'A', 'B','C');

Example:

Suppose that you want to sort the sales orders based on their statuses in the following order:

  • In Process
  • On Hold
  • Canceled
  • Resolved
  • Disputed
  • Shipped

you can use the FIELD() function to map each order status to a number and sort the result by the result of the FIELD() function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SELECT 
    orderNumber, status
FROM
    orders
ORDER BY FIELD(status,
        'In Process',
        'On Hold',
        'Cancelled',
        'Resolved',
        'Disputed',
        'Shipped');
        
>>>
orderNumber	status
10420	In Process
10425	In Process
10334	On Hold
10401	On Hold
10167	Cancelled
10179	Cancelled
10248	Cancelled
10253	Cancelled
10260	Cancelled
10262	Cancelled
10164	Resolved
10327	Resolved
10367	Resolved
10386	Resolved
10406	Disputed
10415	Disputed
10417	Disputed
10100	Shipped
10101	Shipped

6.MySQL ORDER BY and NULL

In MySQL, NULL comes before non-NULL values.

Therefore, when you the ORDER BY clause with the ASC option, NULLs appear first in the result set.

If you use the ORDER BY with the DESC option, NULLs will appear last in the result set

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT 
    firstName, lastName, reportsTo
FROM
    employees
ORDER BY reportsTo DESC;

>>>
firstName | lastName  | reportsTo |
+-----------+-----------+-----------+
| Yoshimi   | Kato      |      1621 |
| Leslie    | Jennings  |      1143 |
| Leslie    | Thompson  |      1143 |
| Julie     | Firrelli  |      1143 |
| ....
| Mami      | Nishi     |      1056 |
| Mary      | Patterson |      1002 |
| Jeff      | Firrelli  |      1002 |
| Diane     | Murphy    |      NULL 

Filtering Data

Basic Operators:

  • WHERE – learn how to use the WHERE clause to filter rows based on specified conditions.
  • SELECT DISTINCT – show you how to use the DISTINCT operator in the SELECT statement to eliminate duplicate rows in a result set.
  • AND – introduce you to the AND operator to combine Boolean expressions to form a complex condition for filtering data.
  • OR– introduce you to the OR operator and show you how to combine the OR operator with the AND operator to filter data.
  • IN – show you how to use the IN operator in the WHERE clause to determine if a value matches any value in a set.
  • NOT IN – negate the IN operator using the NOT operator to check if a value doesn’t match any value in a set.
  • BETWEEN – show you how to query data based on a range using BETWEEN operator.
  • LIKE – provide you with technique to query data based on a pattern.
  • LIMIT – use LIMIT to constrain the number of rows returned by SELECT statement
  • IS NULL – test whether a value is NULL or not by using IS NULL operator.

WHERE clause

The WHERE clause allows you to specify a search condition for the rows returned by a query.

The search_condition is a combination of one or more expressions using the logical operator AND, OR and NOT.

  • WHERE jobtitle = 'Sales Rep';
  • WHERE jobtitle = 'Sales Rep' AND officeCode = 1;
  • WHERE jobtitle = 'Sales Rep' OR officeCode = 1;
  • WHERE officeCode BETWEEN 1 AND 3
  • WHERE lastName LIKE '%son'
  • WHERE officeCode IN (1 , 2, 3)
1
2
3
4
5
6
SELECT 
    select_list
FROM
    table_name
WHERE
    search_condition;

1.Using WHERE clause with equality operator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT 
    lastname, 
    firstname, 
    jobtitle
FROM
    employees
WHERE
    jobtitle = 'Sales Rep';
    
>>>
lastname  | firstname | jobtitle  |
+-----------+-----------+-----------+
| Jennings  | Leslie    | Sales Rep |
| Thompson  | Leslie    | Sales Rep |
| Firrelli  | Julie     | Sales Rep |
| Patterson | Steve     | Sales Rep |
| Tseng     | Foon Yue  | Sales Rep |

2.Using WHERE clause with the AND operator

Use the AND operator to combine two Boolean expressions. The AND operator returns true when both expressions are true; otherwise, it returns false.

Use the AND operator to form conditions in the WHERE clause of the SELECT statement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT 
    lastname, 
    firstname, 
    jobtitle,
    officeCode
FROM
    employees
WHERE
    jobtitle = 'Sales Rep' AND 
    officeCode = 1;
    
>>>
lastname | firstname | jobtitle  | officeCode |
+----------+-----------+-----------+------------+
| Jennings | Leslie    | Sales Rep | 1          |
| Thompson | Leslie    | Sales Rep | 1          

3.Using WHERE clause with OR operator

The OR operator evaluates to TRUE only if one of the expressions evaluates to TRUE:

The query returns any employee who has the job title Sales Rep or office code 1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT 
    lastName, 
    firstName, 
    jobTitle, 
    officeCode
FROM
    employees
WHERE
    jobtitle = 'Sales Rep' OR 
    officeCode = 1
ORDER BY 
    officeCode, 
    jobTitle;
    
>>>
lastName  | firstName | jobTitle           | officeCode |
+-----------+-----------+--------------------+------------+
| Murphy    | Diane     | President          | 1          |
| Bow       | Anthony   | Sales Manager (NA) | 1          |
| Jennings  | Leslie    | Sales Rep          | 1          |
| Thompson  | Leslie    | Sales Rep          | 1          |
| Firrelli  | Jeff      | VP Marketing       | 1          |
| Patterson | Mary      | VP Sales           | 1          |
| Firrelli  | Julie     | Sales Rep          | 2          |
| Patterson | Steve     | Sales Rep          | 2          |
| Tseng     | Foon Yue  | Sales Rep          | 3          |  

Use both AND and OR Operators:

The OR operator combines two Boolean expressions and returns true when either expression is true. Otherwise, it returns false.

MySQL evaluates the OR operator after the AND operator if an expression contains both AND and OR operators.

Use parentheses to change the order of evaluation.

Example: uses the OR operator to select the customers who locate in the USA or France and have a credit limit greater than 100,000.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT   
	customername, 
	country, 
	creditLimit
FROM   
	customers
WHERE(country = 'USA'
		OR country = 'France')
	  AND creditlimit > 100000;
	  
>>>
| customername                 | country | creditLimit |
+------------------------------+---------+-------------+
| La Rochelle Gifts            | France  |   118200.00 |
| Mini Gifts Distributors Ltd. | USA     |   210500.00 |
| Land of Toys Inc.            | USA     |   114900.00 |
| Saveley & Henriot, Co.       | France  |   123900.00 |
| Muscle Machine Inc           | USA     |   138500.00 |

4.Using WHERE clause with the BETWEEN operator

The BETWEEN operator returns TRUE if a value is in a range of values:

expression BETWEEN low AND high

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SELECT 
    firstName, 
    lastName, 
    officeCode
FROM
    employees
WHERE
    officeCode BETWEEN 1 AND 3
ORDER BY officeCode;

>>>
firstName | lastName  | officeCode |
+-----------+-----------+------------+
| Diane     | Murphy    | 1          |
| Mary      | Patterson | 1          |
| Jeff      | Firrelli  | 1          |
| Anthony   | Bow       | 1          |
| Leslie    | Jennings  | 1          |
| Leslie    | Thompson  | 1          |
| Julie     | Firrelli  | 2          |
| Steve     | Patterson | 2          |
| Foon Yue  | Tseng     | 3          |
| George    | Vanauf    | 3          |
+-----------+-----------+------------+

Alternative:This query uses the greater than or equal ( >= ) and less than or equal ( <= ) operators instead of the BETWEEN operator to get the same result:

WHERE officeCode >= 1 AND officeCode <= 3;

NOT BETWEEN clause:

Example: To find the products whose buy prices are not between $20 and $100.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT 
    productCode, 
    productName, 
    buyPrice
FROM
    products
WHERE
    buyPrice NOT BETWEEN 20 AND 100;

>>>
productCode	productName	                         buyPrice
S24_2840	1958 Chevy Corvette Limited Edition	  15.91
S24_2972	1982 Lamborghini Diablo	                  16.24
S18_2238	1998 Chrysler Plymouth Prowler	          101.51
S10_4962	1962 LanciaA Delta 16V	                  103.42

Alternative:

WHERE buyPrice < 20 OR buyPrice > 100;

Using BETWEEN operator with dates

Example: returns the orders with the required dates between 01/01/2003 to 01/31/2003:

  • use the CAST() to cast the literal string '2003-01-01' into a DATE value:
1
2
3
4
5
6
7
8
9
10
11
12
SELECT 
    orderNumber, requiredDate, status
FROM
    orders
WHERE
    requireddate BETWEEN CAST('2003-01-01' AS DATE) AND CAST('2003-01-31' AS DATE);
    
>>>
orderNumber	requiredDate	status
10100	        2003-01-13	Shipped
10101	        2003-01-18	Shipped
10102	        2003-01-18	Shipped

5.Using WHERE clause with the LIKE operator

The LIKE operator evaluates to TRUE if a value matches a specified pattern.

To form a pattern, you use the % and _ wildcards(replacements/placeholders):

  • The % wildcard matches any string of zero or more characters
  • The _ wildcard matches any single character

The following query finds the employees whose last names end with the string ‘son’:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT 
    firstName, 
    lastName
FROM
    employees
WHERE
    lastName LIKE '%son'
ORDER BY firstName;

>>>
+-----------+-----------+
| firstName | lastName  |
+-----------+-----------+
| Leslie    | Thompson  |
| Mary      | Patterson |
| Steve     | Patterson |
| William   | Patterson |
+-----------+-----------+

More examples:

1
2
3
4
5
6
7
WHERE
    firstName LIKE 'a%';
    
>>>
employeeNumber	lastName	firstName
1143	             Bow	Anthony
1611	           Fixter	Andy

uses the LIKE operator to find all employees whose last names contain the substring ‘on’:

1
2
WHERE
    lastname LIKE '%on%';

To find employees whose first names start with the letter T , end with the letter m, and contain any single character between e.g., Tom, Tim:

1
2
WHERE
    firstname LIKE 'T_m';

NOT LIKE clause:

Example: to search for employees whose last names don’t start with the letter B:

1
2
3
4
5
6
7
8
SELECT 
    employeeNumber, 
    lastName, 
    firstName
FROM
    employees
WHERE
    lastName NOT LIKE 'B%';

LIKE operator with the ESCAPE clause

you can use the ESCAPE clause to specify the escape character so that the LIKE operator interprets the wildcard character as a literal character

The backslash character (\) is the default escape character.

Example 1: to find products whose product codes contain the string _20 :

1
2
3
4
5
6
7
8
WHERE
    productCode LIKE '%\_20%';
    
>>>
productCode	productName
S10_2016	1996 Moto Guzzi 1100i
S24_2000	1960 BSA Gold Star DBD34
S24_2011	18th century schooner

Alternatively, you can specify a different escape character e.g., $ using the ESCAPE clause:

1
2
WHERE
    productCode LIKE '%$_20%' ESCAPE '$';

6.Using WHERE clause with the IN operator

The IN operator returns TRUE if a value matches any value in a list.

Use the IN operator to check if a value is in a set of values.

Use the IN operator to form a condition for the WHERE clause.

value IN (value1, value2,...)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT 
    firstName, 
    lastName, 
    officeCode
FROM
    employees
WHERE
    officeCode IN (1 , 2, 3)
ORDER BY 
    officeCode;
    
>>>
+-----------+-----------+------------+
| firstName | lastName  | officeCode |
+-----------+-----------+------------+
| Diane     | Murphy    | 1          |
| Mary      | Patterson | 1          |
| Jeff      | Firrelli  | 1          |
| Anthony   | Bow       | 1          |
| Leslie    | Jennings  | 1          |
| Leslie    | Thompson  | 1          |
| Julie     | Firrelli  | 2          |
| Steve     | Patterson | 2          |
| Foon Yue  | Tseng     | 3          |
| George    | Vanauf    | 3          |
+-----------+-----------+------------+

Example 2: uses the IN operator to find the offices located in the USA and France:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT 
    officeCode, 
    city, 
    phone, 
    country
FROM
    offices
WHERE
    country IN ('USA' , 'France');

>>>
| officeCode | city          | phone           | country |
+------------+---------------+-----------------+---------+
| 1          | San Francisco | +1 650 219 4782 | USA     |
| 2          | Boston        | +1 215 837 0825 | USA     |
| 3          | NYC           | +1 212 555 3000 | USA     |
| 4          | Paris         | +33 14 723 4404 | France  |

The same result but alternative way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT 
    officeCode, 
    city, 
    phone
FROM
    offices
WHERE
    country = 'USA' OR country = 'France';
    
>>>
| officeCode | city          | phone           | country |
+------------+---------------+-----------------+---------+
| 1          | San Francisco | +1 650 219 4782 | USA     |
| 2          | Boston        | +1 215 837 0825 | USA     |
| 3          | NYC           | +1 212 555 3000 | USA     |
| 4          | Paris         | +33 14 723 4404 | France  |

NOT IN operator

Use the NOT IN to check if a value doesn’t match any value in a list.

Example: uses the NOT IN operator to find the offices that do not locate in France and the USA:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT 
    officeCode, 
    city, 
    phone
FROM
    offices
WHERE
    country NOT IN ('USA' , 'France')
ORDER BY 
    city;

>>>
| officeCode | city   | phone            |
+------------+--------+------------------+
| 7          | London | +44 20 7877 2041 |
| 6          | Sydney | +61 2 9264 2451  |
| 5          | Tokyo  | +81 33 224 5000  |

7.UsingLIMIT clause to constrain the number of rows returned by a query.

The LIMIT clause is used in the SELECT statement to constrain the number of rows to return.

The LIMIT clause accepts one or two arguments.

The values of both arguments must be zero or positive integers.

  • The offset specifies the offset of the first row to return. The offset of the first row is 0, not 1.
  • The row_count specifies the maximum number of rows to return.
1
2
3
4
5
SELECT 
    select_list
FROM
    table_name
LIMIT [offset,] row_count;

Example 1: uses the LIMIT clause to get the top five customers who have the highest credit:

  • the ORDER BY clause sorts the customers by credits in high to low.
  • the LIMIT clause returns the first 5 rows.
1
2
3
4
5
6
7
8
SELECT 
    customerNumber, 
    customerName, 
    creditLimit
FROM
    customers
ORDER BY creditLimit DESC
LIMIT 5;

To calculate the number of pages:

1
2
3
4
5
6
7
8
9
10
11
SELECT 
    COUNT(*) 
FROM 
    customers;
    
>>>
+----------+
| COUNT(*) |
+----------+
|      122 |
+----------+

Then figure out how many pages you need if each page can only contain 10 rows:

To get rows of page 1 which contains the first 10 customers sorted by the customer name:

1
2
3
4
5
6
7
SELECT 
    customerNumber, 
    customerName
FROM
    customers
ORDER BY customerName    
LIMIT 10;

To get the rows of the second page that include rows 11 – 20:

1
2
3
4
5
6
7
SELECT 
    customerNumber, 
    customerName
FROM
    customers
ORDER BY customerName    
LIMIT 10, 10;

8.Using WHERE clause with the IS NULL operator

To check if a value is NULL or not, you use the IS NULL operator, not the equal operator (=).

The IS NULL operator returns TRUE if a value is NULL.

NULL is a marker that indicates that a value is missing or unknown.

NULL is not equivalent to the number 0 or an empty string.

value IS NULL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT 
    lastName, 
    firstName, 
    reportsTo
FROM
    employees
WHERE
    reportsTo IS NULL;
    
>>>
----------+-----------+-----------+
| lastName | firstName | reportsTo |
+----------+-----------+-----------+
| Murphy   | Diane     |      NULL |
+----------+-----------+-----------+

IS NOT NULL clause:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT 
    customerName, 
    country, 
    salesrepemployeenumber
FROM
    customers
WHERE
    salesrepemployeenumber IS NOT NULL
ORDER BY 
   customerName;
>>>

9.Using WHERE clause with comparison operators

Comparison Operators:

  • = - Equal to.You can use it with almost any data type.
  • <> or != - Not equal to.
  • < - Less than. You typically use it with numeric and date/time data types.
  • > - Greater than.
  • <= - Less than or equal to.
  • >= - Greater than or equal to.

SELECT DISTINCT clause

When querying data from a table, you may get duplicate rows.

1.SELECT DISTINCT

To remove these duplicate rows, you use the DISTINCT clause in the SELECT statement.

SELECT DISTINCT lastname

1
2
3
4
5
6
SELECT 
    DISTINCT lastname
FROM
    employees
ORDER BY 
    lastname;

2.DISTINCT and NULL values

When you specify a column that has NULL values in the DISTINCT clause, the DISTINCT clause will keep only one NULL value because it considers all NULL values are the same.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT DISTINCT state
FROM customers;

>>>
| state         |
+---------------+
| NULL          |
| NV            |
| Victoria      |
| CA            |
| NY            |
| PA            |
...
| Co. Cork      |
| Pretoria      |
| NH            |
| Tokyo         |
+---------------+

3.DISTINCT with multiple columns

When you specify multiple columns in the DISTINCT clause, the DISTINCT clause will use the combination of values in these columns to determine the uniqueness of the row in the result set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT DISTINCT
    state, city
FROM
    customers
WHERE
    state IS NOT NULL
ORDER BY 
    state, 
    city;
    
>>>
| state         | city           |
+---------------+----------------+
| BC            | Tsawassen      |
| BC            | Vancouver      |
| CA            | Brisbane       |
| CA            | Burbank        |
| CA            | Burlingame     |
| CA            | Glendale       |
| CA            | Los Angeles    |
| CA            | Pasadena       |
| CA            | San Diego      |

Compared to the results without DISTINCT clause in the query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
SELECT 
    state, city
FROM
    customers
WHERE
    state IS NOT NULL
ORDER BY 
    state , 
    city;

>>>
| state         | city           |
+---------------+----------------+
| BC            | Tsawassen      |
| BC            | Vancouver      |
| CA            | Brisbane       |
| CA            | Burbank        |
..
| CA            | San Francisco  |
| CA            | San Francisco  |
...
| MA            | Boston         |
| MA            | Boston         |
| MA            | Brickhaven     |
| MA            | Brickhaven     |
| MA            | Brickhaven     |
...
| NY            | NYC            |
| NY            | NYC            |
| NY            | NYC            |
| NY            | NYC            |
| NY            | NYC            |