Subqueries
Basic clauses:
Subquery
– show you how to nest a query (inner query) within another query (outer query) and use the result of the inner query for the outer query.Derived table
– introduce you to the derived table concept and show you how to use it to simplify complex queries.EXISTS
– test for the existence of rows.
Subquery
A subquery is a query nested within another query such as SELECT
, INSERT
, UPDATE
or DELETE
.
Also, a subquery can be nested within another subquery.
- inner query - A subquery is called an inner query
- outer query - a query that contains the subquery
A subquery can be used anywhere that expression is used and must be closed in parentheses.
Example: uses a subquery to return the employees who work in the offices located in the USA
- The subquery returns all office codes of the offices located in the USA.
- The outer query selects the last name and first name of employees who work in the offices whose office codes are in the result set returned by the subquery.
1
2
3
4
5
6
7
8
9
10
11
12
SELECT
lastName, firstName
FROM
employees
WHERE
officeCode IN (SELECT
officeCode
FROM
offices
WHERE
country = 'USA');
subquery with comparison operators
You can use comparison operators e.g., =
, >
, <
to compare a single value returned by the subquery with the expression in the WHERE
clause.
Example 1: returns the customer who has the highest payment.
1
2
3
4
5
6
7
8
9
10
11
12
SELECT
customerNumber,
checkNumber,
amount
FROM
payments
WHERE
amount = (SELECT MAX(amount) FROM payments);
>>>
customerNumber checkNumber amount
141 JE105477 120166.58
Example 2: find customers whose payments are greater than the average payment using a subquery
- First, get the average payment by using a subquery.
- Then, select the payments that are greater than the average payment returned by the subquery in the outer query.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
customerNumber,
checkNumber,
amount
FROM
payments
WHERE
amount > (SELECT
AVG(amount)
FROM
payments);
>>>
customerNumber checkNumber amount
112 HQ55022 32641.98
112 ND748579 33347.88
114 GG31455 45864.03
subquery with IN and NOT IN operators
If a subquery returns more than one value, you can use other operators such as IN
or NOT IN
operator in the WHERE
clause.
Example: use a subquery with NOT IN
operator to find the customers who have not placed any orders
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
customerName
FROM
customers
WHERE
customerNumber NOT IN (SELECT DISTINCT
customerNumber
FROM
orders);
>>>
customerName
Havel & Zbyszek Co
American Souvenirs Inc
Porto Imports Co.
subquery in the FROM
clause
When you use a subquery in the FROM
clause, the result set returned from a subquery is used as a temporary table.
This table is referred to as a derived table or materialized subquery.
Example: finds the maximum, minimum, and average number of items in sale orders
FLOOR()
is used to remove decimal places from the average values of items
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
MAX(items),
MIN(items),
FLOOR(AVG(items))
FROM
(SELECT
orderNumber, COUNT(orderNumber) AS items
FROM
orderdetails
GROUP BY orderNumber) AS lineitems;
>>>
MAX(items) MIN(items) FLOOR(AVG(items))
18 1 9
correlated subquery
subquery is independent.
It means that you can execute the subquery as a standalone query.
1
2
3
4
5
6
7
SELECT
orderNumber,
COUNT(orderNumber) AS items
FROM
orderdetails
GROUP BY orderNumber;
A correlated subquery is a subquery that uses the data from the outer query.
In other words, a correlated subquery depends on the outer query. A correlated subquery is evaluated once for each row in the outer query.
Example: uses a correlated subquery to select products whose buy prices are greater than the average buy price of all products in each product line.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
productname,
buyprice
FROM
products p1
WHERE
buyprice > (SELECT
AVG(buyprice)
FROM
products
WHERE
productline = p1.productline)
>>>
productname buyprice
1952 Alpine Renault 1300 98.58
1996 Moto Guzzi 1100i 68.99
Both outer query and correlated subquery reference the same products table. Therefore, we need to use a table alias p1 for the products table in the outer query.
Unlike a regular subquery, you cannot execute a correlated subquery independently like this. If you do so, MySQL doesn’t know the p1 table and will issue an error.
- For each row in the products (or p1) table, the correlated subquery needs to execute once to get the average buy price of all products in the productline of that row.
- If the buy price of the current row is greater than the average buy price returned by the correlated subquery, the query includes the row in the result set.
1
2
3
4
5
6
7
SELECT
AVG(buyprice)
FROM
products
WHERE
productline = p1.productline;
subquery with EXISTS and NOT EXISTS
When a subquery is used with the EXISTS
or NOT EXISTS
operator, a subquery returns a Boolean
value of TRUE
or FALSE
.
The following query illustrates a subquery used with the EXISTS
operator:
1
2
3
4
5
6
SELECT
*
FROM
table_name
WHERE
EXISTS( subquery );
In the query above, if the subquery returns any rows, EXISTS
subquery returns TRUE
, otherwise, it returns FALSE
.
The EXISTS
and NOT EXISTS
are often used in the correlated subqueries.
Example: finds sales orders whose total values are greater than 60K
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT
orderNumber,
SUM(priceEach * quantityOrdered) total
FROM
orderdetails
INNER JOIN
orders USING (orderNumber)
GROUP BY orderNumber
HAVING SUM(priceEach * quantityOrdered) > 60000;
>>>
orderNumber total
10165 67392.85
10287 61402.00
10310 61234.67
Derived Tables
A derived table is a virtual table returned from a SELECT
statement.
A derived table is similar to a temporary table, but using a derived table in the SELECT
statement is much simpler than a temporary table because it does not require creating the temporary table.
The term derived table and subquery is often used interchangeably.
When a stand-alone subquery is used in the FROM
clause of a SELECT
statement, it is also called a derived table.
Unlike a subquery, a derived table must have an alias so that you can reference its name later in the query.
Basic Syntax:
1
2
3
4
5
6
7
8
9
SELECT
select_list
FROM
(SELECT
select_list
FROM
table_1) derived_table_name
WHERE
derived_table_name.c1 > 0;
Example: gets the top five products by sales revenue in 2003 from the orders
and orderdetails
tables, then use the result of this query as a derived table and join it with the products
table
- First, the subquery is executed to create a result set or derived table.
- Then, the outer query is executed that joined the top5product2003 derived table with the products table using the productCode column.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT
productName, sales
FROM
(SELECT
productCode,
ROUND(SUM(quantityOrdered * priceEach)) sales
FROM
orderdetails
INNER JOIN orders USING (orderNumber)
WHERE
YEAR(shippedDate) = 2003
GROUP BY productCode
ORDER BY sales DESC
LIMIT 5) top5products2003
INNER JOIN
products USING (productCode);
>>>
productName sales
2001HelloKitty 123220
2010Hellowendy 232102
derived table example
Scenario: classify the customers who bought products in 2003 into 3 groups: platinum, gold, and silver. Also, you need to know the number of customers in each group with the following conditions:
- Platinum customers who have orders with the volume greater than 100K.
- Gold customers who have orders with the volume between 10K and 100K.
- Silver customers who have orders with the volume less than 10K.
First, need to put each customer into the respective group using CASE
expression and GROUP BY
clause:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SELECT
customerNumber,
ROUND(SUM(quantityOrdered * priceEach)) sales,
(CASE
WHEN SUM(quantityOrdered * priceEach) < 10000 THEN 'Silver'
WHEN SUM(quantityOrdered * priceEach) BETWEEN 10000 AND 100000 THEN 'Gold'
WHEN SUM(quantityOrdered * priceEach) > 100000 THEN 'Platinum'
END) customerGroup
FROM
orderdetails
INNER JOIN
orders USING (orderNumber)
WHERE
YEAR(shippedDate) = 2003
GROUP BY customerNumber;
>>>
customerNumber sales customerGroup
103 12323 Gold
112 23212 Gold
113 30403 Platinum
124 50232 Silver
...
Then, use this query as the derived table and perform grouping:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT
customerGroup,
COUNT(cg.customerGroup) AS groupCount
FROM
(SELECT
customerNumber,
ROUND(SUM(quantityOrdered * priceEach)) sales,
(CASE
WHEN SUM(quantityOrdered * priceEach) < 10000 THEN 'Silver'
WHEN SUM(quantityOrdered * priceEach) BETWEEN 10000 AND 100000 THEN 'Gold'
WHEN SUM(quantityOrdered * priceEach) > 100000 THEN 'Platinum'
END) customerGroup
FROM
orderdetails
INNER JOIN orders USING (orderNumber)
WHERE
YEAR(shippedDate) = 2003
GROUP BY customerNumber) cg
GROUP BY cg.customerGroup;
>>>
customerGroup groupCount
Gold 61
Silver 8
Platinum 4
EXISTS
The EXISTS
operator is a Boolean operator that returns either true or false.
The EXISTS
operator is often used to test for the existence of rows returned by the subquery.
The EXISTS
operator terminates further processing immediately once it finds a matching row, which can help improve the performance of the query.
Basic Syntax:
1
2
3
4
5
6
SELECT
select_list
FROM
a_table
WHERE
[NOT] EXISTS(subquery);
Example 1: uses the EXISTS
operator to find the customer who has at least one order
- If the
customerNumber
, which appears in thecustomers
table, exists in theorders
table, the subquery returns the first matching row. - As a result, the
EXISTS
operator returns true and stops examining theorders
table.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT
customerNumber,
customerName
FROM
customers
WHERE
EXISTS(
SELECT
1
FROM
orders
WHERE
orders.customernumber
= customers.customernumber);
>>>
customerNumber customerName
103 Atelier graphique
112 Signal Gift Stores
114 Australian Collectors, Co
Example 2: uses the NOT EXISTS
operator to find customers who do not have any orders
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
customerNumber,
customerName
FROM
customers
WHERE
NOT EXISTS(
SELECT
1
FROM
orders
WHERE
orders.customernumber = customers.customernumber
);
UPDATE EXISTS
Scenario: have to update the phone’s extensions of the employees who work at the office in San Francisco.
First, finds employees who work at the office in San Franciso
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT
employeenumber,
firstname,
lastname,
extension
FROM
employees
WHERE
EXISTS(
SELECT
1
FROM
offices
WHERE
city = 'San Francisco' AND
offices.officeCode = employees.officeCode);
>>>
employeenumber firstname lastname extension
1002 Diane Murphy x5800
1056 Mary Patterson x4611
1076 Jeff Firrelli x9273
Then, adds the number 1 to the phone extension of employees who work at the office in San Francisco:
1
2
3
4
5
6
7
8
9
10
11
12
13
UPDATE employees
SET
extension = CONCAT(extension, '1')
WHERE
EXISTS(
SELECT
1
FROM
offices
WHERE
city = 'San Francisco'
AND offices.officeCode = employees.officeCode);
INSERT EXISTS
Scenario: to archive customers who don’t have any sales order in a separate table.
1)create a new table for archiving the customers
by copying the structure from the customers
table:
1
2
CREATE TABLE customers_archive
LIKE customers;
2)insert customers who do not have any sales order into the customers_archive
table using the following INSERT
statement.
1
2
3
4
5
6
7
8
9
10
11
INSERT INTO customers_archive
SELECT *
FROM customers
WHERE NOT EXISTS(
SELECT 1
FROM
orders
WHERE
orders.customernumber = customers.customernumber
);
3)query data from the customers_archive
table to verify the insert operation.
1
2
3
4
5
6
SELECT * FROM customers_archive;
customerNumber customerName
103 Atelier graphique
112 Signal Gift Stores
114 Australian Collectors, Co.
DELETE EXISTS
Scenario: to delete the customers that exist in the customers_archive
table from the customers
table.
use the EXISTS
operator in WHERE
clause of the DELETE
statement:
1
2
3
4
5
6
7
8
9
10
11
DELETE FROM customers
WHERE EXISTS(
SELECT
1
FROM
customers_archive a
WHERE
a.customernumber = customers.customerNumber);
EXISTS operator vs. IN operator
IN
operator:
To find the customer who has placed at least one order:
1
2
3
4
5
6
7
8
9
10
11
SELECT
customerNumber,
customerName
FROM
customers
WHERE
customerNumber IN (
SELECT
customerNumber
FROM
orders);
To compare the query that uses the IN
operator with the one that uses the EXISTS
operator by using the EXPLAIN
statement:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
EXPLAIN SELECT
customerNumber,
customerName
FROM
customers
WHERE
EXISTS(
SELECT
1
FROM
orders
WHERE
orders.customernumber = customers.customernumber);
The query that uses the EXISTS
operator is much faster than the one that uses the IN
operator.
- The
EXISTS
operator works based on the “at least found” principle. - The
EXISTS
stops scanning the table when a matching row found. - When the
IN
operator is combined with a subquery, MySQL must process the subquery first - Then uses the result of the subquery to process the whole query.
If the subquery contains a large volume of data, the EXISTS
operator provides better performance.
However, the query that uses the IN
operator will perform faster if the result set returned from the subquery is very small.