Grouping Data
Basic Clauses:
GROUP BY
– show you how to group rows into groups based on columns or expressions.HAVING
– filter the groups by a specific condition.ROLLUP
– generate multiple grouping sets considering a hierarchy between columns specified in the GROUP BY clause.
GROUP BY clause
The GROUP BY
clause groups a set of rows into a set of summary rows by values of columns or expressions.
The GROUP BY
clause returns one row for each group. In other words, it reduces the number of rows in the result set.
The GROUP BY
clause is an optional clause of the SELECT statement.
1
2
3
4
5
6
7
SELECT
c1, c2,..., cn, aggregate_function(ci)
FROM
table
WHERE
where_conditions
GROUP BY c1 , c2,...,cn;
GROUP BY
example:
To group values of the order’s status into subgroups.
1
2
3
4
5
SELECT
status
FROM
orders
GROUP BY status;
GROUP BY with aggregate functions
The aggregate functions allow you to perform the calculation of a set of rows and return a single value.
The GROUP BY
clause is often used with an aggregate function to perform calculations and return a single value for each subgroup.
Example 1: COUNT(*)
- to know the number of orders in each status
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
status, COUNT(*)
FROM
orders
GROUP BY status;
>>>
status COUNT(*)
Cancelled 6
Disputed 3
In Process 6
On Hold 4
Resolved 4
Example 2: INNER JOIN
- To get the total amount of all orders by status
- Use the
SUM
function to calculate the total amount
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT
status,
SUM(quantityOrdered * priceEach) AS amount
FROM
orders
INNER JOIN orderdetails
USING (orderNumber)
GROUP BY
status;
>>>
status amount
Cancelled 238854.18
Disputed 61158.78
In Process 135271.52
Example 3: SUM(quantityOrdered * priceEach) AS total
- returns the order numbers and the total amount of each order.
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
orderNumber,
SUM(quantityOrdered * priceEach) AS total
FROM
orderdetails
GROUP BY
orderNumber;
>>>
status COUNT(*)
Cancelled 6
Disputed 3
In Process 6
GROUP BY with expression example
You can also group rows by expressions.
- used the
YEAR
function to extract year data from order date (orderDate
) - the expression which appears in the
SELECT
clause must be the same as the one in theGROUP BY
clause
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
YEAR(orderDate) AS year,
SUM(quantityOrdered * priceEach) AS total
FROM
orders
INNER JOIN orderdetails
USING (orderNumber)
WHERE
status = 'Shipped'
GROUP BY
YEAR(orderDate);
>>>
year total
2003 3223095.80
2004 4300602.99
2005 1341395.85
GROUP BY with HAVING clause example
To filter the groups returned by GROUP BY
clause, you use a HAVING
clause
Example:
- uses the
HAVING
clause to select the total sales of the years after 2003
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
YEAR(orderDate) AS year,
SUM(quantityOrdered * priceEach) AS total
FROM
orders
INNER JOIN orderdetails
USING (orderNumber)
WHERE
status = 'Shipped'
GROUP BY
year
HAVING
year > 2003;
>>>
year total
2004 4300602.99
2005 1341395.85
GROUP BY clause vs. DISTINCT clause
If you use the GROUP BY
clause in the SELECT
statement without using aggregate functions, the GROUP BY
clause behaves like the DISTINCT
clause.
The difference between DISTINCT
clause and GROUP BY
clause is that the GROUP BY
clause sorts the result set, whereas the DISTINCT
clause does not.
The following statement uses the GROUP BY
clause to select the unique states of customers from the customers table.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
state
FROM
customers
GROUP BY state;
>>>
state
null
BC
CA
Co. Cork
CT
Isle o
Alternative:
- You can achieve a similar result by using the
DISTINCT
clause:
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT DISTINCT
state
FROM
customers;
>>>
state
null
NV
Victoria
CA
NY
PA
Using DISTINCT
clause & GROUP BY
clause:
If you add the ORDER BY
clause to the statement that uses the DISTINCT
clause, the result set is sorted, and it is the same as the one returned by the statement that uses GROUP BY
clause.
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT DISTINCT
state
FROM
customers
ORDER BY
state;
>>>
state
null
BC
CA
Co. Cork
HAVING clause
The HAVING
clause is used in the SELECT
statement to specify filter conditions for a group of rows or aggregates.
The HAVING
clause is often used with the GROUP BY
clause to filter groups based on a specified condition.
If you omit the GROUP BY
clause, the HAVING
clause behaves like the WHERE
clause.
HAVING group_condition;
Difference between HAVING
clause vs WHERE
clause:
- the
HAVING
clause applies a filter condition to each group of rows - the
WHERE
clause applies the filter condition to each individual row
1
2
3
4
5
6
7
8
9
10
11
SELECT
select_list
FROM
table_name
WHERE
search_condition
GROUP BY
group_by_expression
HAVING
group_condition;
HAVING
clause example - greater than
Find which order has total sales greater than 1000 by using the HAVING
clause:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
ordernumber,
SUM(quantityOrdered) AS itemsCount,
SUM(priceeach*quantityOrdered) AS total
FROM
orderdetails
GROUP BY
ordernumber
HAVING
total > 1000;
>>>
ordernumber itemsCount total
10100 151 10223.83
10101 142 10549.01
10102 80 5494.78
HAVING
clause with logical operators OR
& AND
.
Uses the HAVING
clause to find orders that have total amounts greater than 1000 and contain more than 600 items:
HAVING total > 1000 AND itemsCount > 600;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
ordernumber,
SUM(quantityOrdered) AS itemsCount,
SUM(priceeach*quantityOrdered) AS total
FROM
orderdetails
GROUP BY ordernumber
HAVING
total > 1000 AND
itemsCount > 600;
>>>
ordernumber itemsCount total
10106 675 52151.81
10126 617 57131.92
10135 607 55601.84
To find all orders that already shipped and have a total amount greater than 1500:
- you can join the
orderdetails
table with the orders table using theINNER JOIN
clause and apply a condition onstatus
column andtotal
aggregate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT
a.ordernumber,
status,
SUM(priceeach*quantityOrdered) total
FROM
orderdetails a
INNER JOIN orders b
ON b.ordernumber = a.ordernumber
GROUP BY
ordernumber,
status
HAVING
status = 'Shipped' AND
total > 1500;
>>>
ordernumber status total
10100 Shipped 10223.83
10101 Shipped 10549.01
10102 Shipped 5494.78
ROLLUP clause
You can use the ROLLUP
clause to generate subtotals and grand totals.
First, setup a sample table:
- creates a new table named
sales
that stores the order values summarized by product lines and years. - ehe data comes from the
products
,orders
, andorderDetails
tables in the sample database.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE TABLE sales
SELECT
productLine,
YEAR(orderDate) orderYear,
SUM(quantityOrdered * priceEach) orderValue
FROM
orderDetails
INNER JOIN
orders USING (orderNumber)
INNER JOIN
products USING (productCode)
GROUP BY
productLine ,
YEAR(orderDate);
>>>
1
2
3
4
5
6
7
8
SELECT * FROM sales;
>>>
productLine orderYear orderValue
Classic Cars 2010 4080.00
Trains 2011 2770.00
Motorcycles 2011 1323.00
Vintage Cars 2012 5022.00
Secondly, a grouping set is a set of columns to which you want to group.
For example, the following query creates a grouping set denoted by (productline
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
productline,
SUM(orderValue) totalOrderValue
FROM
sales
GROUP BY
productline;
>>>
productLine totalOrderValue
Classic Cars 15434.00
Trains 13232.00
Motorcycles 10445.00
Vintage Cars 9922.00
Then creates an empty grouping set denoted by the ()
:
1
2
3
4
5
6
7
8
SELECT
SUM(orderValue) totalOrderValue
FROM
sales;
>>>
totalOrderValue
89022.69
To generate two or more grouping sets together in one query, you may use the UNION ALL
operator:
- The
NULL
in the productLine column identifies the grand total super-aggregate line.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT
productline,
SUM(orderValue) totalOrderValue
FROM
sales
GROUP BY
productline
UNION ALL
SELECT
NULL,
SUM(orderValue) totalOrderValue
FROM
sales;
>>>
productLine totalOrderValue
Classic Cars 15434.00
Trains 13232.00
Motorcycles 10445.00
Vintage Cars 9922.00
...
null 89022.69
Solution: use ROLLUP
clause
The ROLLUP
clause is an extension of the GROUP BY
clause with the following syntax:
1
2
3
4
5
6
SELECT
select_list
FROM
table_name
GROUP BY
c1, c2, c3 WITH ROLLUP;
The ROLLUP
generates multiple grouping sets based on the columns or expressions specified in the GROUP BY clause:
It generates not only the subtotals but also the grand total of the order values.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
productLine,
SUM(orderValue) totalOrderValue
FROM
sales
GROUP BY
productline WITH ROLLUP;
>>>
productLine totalOrderValue
Classic Cars 15434.00
Trains 13232.00
Motorcycles 10445.00
Vintage Cars 9922.00
...
null 89022.69
ROLLUP
clause with hierarchy
Basic syntax:
GROUP BY c1, c2, c3 WITH ROLLUP
Then the hierarchy is as below:
c1 > c2 > c3
Then the subtotal will be based off c1
Example:
- In this case,
productLine > orderYear
- If you reverse it to
GROUP BY orderYear, productline WITH ROLLUP;
it will beorderYear > productLine
, which means calculating the subtotal of total order value for each year group
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SELECT
productLine,
orderYear,
SUM(orderValue) totalOrderValue
FROM
sales
GROUP BY
productline,
orderYear
WITH ROLLUP;
>>>
productLine orderYear totalOrderValue
Classic Cars 2012 10000.00
Classic Cars 2013 11000.00
Classic Cars 2014 12000.00
Classic Cars null 33000.00
Trains 2012 15000.00
Trains 2014 15000.00
Trains 2015 15000.00
Trains null 45000.00
...
null null 78000.00
GROUPING() function
Use the GROUPING()
function to check whether NULL
in the result set represents the subtotals or grand totals.
The GROUPING()
function returns 1 when NULL
occurs in a supper-aggregate row, otherwise, it returns 0.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT
IF(GROUPING(orderYear),
'All Years',
orderYear) orderYear,
IF(GROUPING(productLine),
'All Product Lines',
productLine) productLine,
SUM(orderValue) totalOrderValue
FROM
sales
GROUP BY
orderYear ,
productline
WITH ROLLUP;
>>>
orderYear productLine totalOrderValue
2010 Classic Cars 10000.00
2010 Motorcycles 5000.00
2010 Trains 8000.00
2010 All Product Lines 23000.00
2011 Classic Cars 20000.00
2011 Motorcycles 8000.00
2011 Trains 9000.00
2011 All Product Lines 37000.00
All Years All Product Lines 60000.00