Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Grouping Sets, Rollup and Cube to extend group by statement #2039

Closed
yangzhg opened this issue Oct 22, 2019 · 0 comments
Closed

Support Grouping Sets, Rollup and Cube to extend group by statement #2039

yangzhg opened this issue Oct 22, 2019 · 0 comments
Assignees
Labels
area/sql/compatibility Issues or PRs related to the SQL compatibililty kind/feature Categorizes issue or PR as related to a new feature. need more test Add more test
Milestone

Comments

@yangzhg
Copy link
Member

yangzhg commented Oct 22, 2019

1. GROUPING SETS Background

The CUBE, ROLLUP, and GROUPING SETS extensions to SQL make querying and reporting easier and faster. CUBE, ROLLUP, and grouping sets produce a single result set that is equivalent to a UNION ALL of differently grouped rows. ROLLUP calculates aggregations such as SUM, COUNT, MAX, MIN, and AVG at increasing levels of aggregation, from the most detailed up to a grand total. CUBE is an extension similar to ROLLUP, enabling a single statement to calculate all possible combinations of aggregations. The CUBE, ROLLUP, and the GROUPING SETS extension lets you specify just the groupings needed in the GROUP BY clause. This allows efficient analysis across multiple dimensions without performing a CUBE operation. Computing a CUBE creates a heavy processing load, so replacing cubes with grouping sets can significantly increase performance.
To enhance performance, CUBE, ROLLUP, and GROUPING SETS can be parallelized: multiple processes can simultaneously execute all of these statements. These capabilities make aggregate calculations more efficient, thereby enhancing database performance, and scalability.

The three GROUPING functions help you identify the group each row belongs to and enable sorting subtotal rows and filtering results.

1.1 GROUPING SETS Syntax

GROUPING SETS syntax lets you define multiple groupings in the same query. GROUP BY computes all the groupings specified and combines them with UNION ALL. For example, consider the following statement:

SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );

This statement is equivalent to:

SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b
UNION
SELECT a, null, SUM( c ) FROM tab1 GROUP BY a
UNION
SELECT null, b, SUM( c ) FROM tab1 GROUP BY b
UNION
SELECT null, null, SUM( c ) FROM tab1

This is an example of real query:

mysql> SELECT * FROM t;
+------+------+------+
| k1   | k2   | k3   |
+------+------+------+
| a    | A    |    1 |
| a    | A    |    2 |
| a    | B    |    1 |
| a    | B    |    3 |
| b    | A    |    1 |
| b    | A    |    4 |
| b    | B    |    1 |
| b    | B    |    5 |
+------+------+------+
8 rows in set (0.01 sec)

mysql> SELECT k1, k2, SUM(k3) FROM t GROUP BY GROUPING SETS ( (k1, k2), (k2), (k1), ( ) );
+------+------+-----------+
| k1   | k2   | sum(`k3`) |
+------+------+-----------+
| b    | B    |         6 |
| a    | B    |         4 |
| a    | A    |         3 |
| b    | A    |         5 |
| NULL | B    |        10 |
| NULL | A    |         8 |
| a    | NULL |         7 |
| b    | NULL |        11 |
| NULL | NULL |        18 |
+------+------+-----------+
9 rows in set (0.06 sec)

1.2 ROLLUP Syntax

ROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. It also calculates a grand total. ROLLUP is a simple extension to the GROUP BY clause, so its syntax is extremely easy to use. The ROLLUP extension is highly efficient, adding minimal overhead to a query.

ROLLUP appears in the GROUP BY clause in a SELECT statement. Its form is:

SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP(a,b,c)

This statement is equivalent to GROUPING SETS as followed:

GROUPING SETS (
(a,b,c),
( a, b ),
( a),
( )
)

1.3 CUBE Syntax

Like ROLLUP CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions.

SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY CUBE(a,b,c)

e.g. CUBE ( a, b, c ) is equivalent to GROUPING SETS as followed:

GROUPING SETS (
( a, b, c ),
( a, b ),
( a,    c ),
( a       ),
(    b, c ),
(    b    ),
(       c ),
(         )
)

1.4 GROUPING_ID() Function

GROUPING_ID describes which of a list of expressions are grouped in a row produced by a GROUP BY query. The GROUPING_ID function simply returns the decimal equivalent of the binary value formed as a result of the concatenation of the values returned by the GROUPING functions.

Each GROUPING_ID argument must be an element of the GROUP BY list. GROUPING_ID () returns an integer bitmap whose lowest N bits may be lit. A lit bit indicates the corresponding argument is not a grouping column for the given output row. The lowest-order bit corresponds to argument N, and the N-1th lowest-order bit corresponds to argument 1. If the column is a grouping column the bit is 0 else is 1.

For example:

mysql> select * from t;
+------+------+------+
| k1   | k2   | k3   |
+------+------+------+
| a    | A    |    1 |
| a    | A    |    2 |
| a    | B    |    1 |
| a    | B    |    3 |
| b    | A    |    1 |
| b    | A    |    4 |
| b    | B    |    1 |
| b    | B    |    5 |
+------+------+------+

grouping sets result:

mysql> SELECT k1, k2, GROUPING_ID(k1,k2), SUM(k3) FROM t GROUP BY GROUPING SETS ( (k1, k2), (k2), (k1), ( ) );
+------+------+---------------+-----------+
| k1   | k2   | grouping_id(k1,k2) | sum(`k3`) |
+------+------+---------------+-----------+
| a    | A    |             0 |         3 |
| a    | B    |             0 |         4 |
| a    | NULL |             1 |         7 |
| b    | A    |             0 |         5 |
| b    | B    |             0 |         6 |
| b    | NULL |             1 |        11 |
| NULL | A    |             2 |         8 |
| NULL | B    |             2 |        10 |
| NULL | NULL |             3 |        18 |
+------+------+---------------+-----------+
9 rows in set (0.02 sec)

1.5 GROUPING Function

Indicates whether a specified column expression in a GROUP BY list is aggregated or not. GROUPING returns 1 for aggregated or 0 for not aggregated in the result set. GROUPING can be used only in the SELECT list, HAVING, and ORDER BY clauses when GROUP BY is specified.

2. Object

Support GROUPING SETSROLLUP and CUBE syntax,impliments 1.1, 1.2, 1.3 1.4, 1.5

2.1 GROUPING SETS Syntax

SELECT ...
FROM ...
[ ... ]
GROUP BY GROUPING SETS ( groupSet [ , groupSet [ , ... ] ] )
[ ... ]

groupSet ::= { ( expr  [ , expr [ , ... ] ] )}

<expr>
Expression,column name.

2.2 ROLLUP Syntax

SELECT ...
FROM ...
[ ... ]
GROUP BY ROLLUP ( expr  [ , expr [ , ... ] ] )
[ ... ]

<expr>
Expression,column name.

2.3 CUBE Syntax

SELECT ...
FROM ...
[ ... ]
GROUP BY CUBE ( expr  [ , expr [ , ... ] ] )
[ ... ]

<expr>
Expression,column name.

3. Implementation

3.1 Overall Design Approaches

For GROUPING SET is equivalent to the UNION of GROUP BY . So we can expand input rows, and run an GROUP BY on these rows。

For example:

SELECT a, b FROM src GROUP BY a, b GROUPING SETS ((a, b), (a), (b), ());

Data in table src :

1, 2
3, 4

Base on GROUPING SETS , we can expend the input to:

1, 2       (GROUPING_ID: a, b -> 00 -> 0)
1, null    (GROUPING_ID: a, null -> 01 -> 1)
null, 2    (GROUPING_ID: null, b -> 10 -> 2)
null, null (GROUPING_ID: null, null -> 11 -> 3)

3, 4       (GROUPING_ID: a, b -> 00 -> 0)
3, null    (GROUPING_ID: a, null -> 01 -> 1)
null, 4    (GROUPING_ID: null, b -> 10 -> 2)
null, null (GROUPING_ID: null, null -> 11 -> 3)

And then use those row as input, then GROUP BY a, b, GROUPING_ID

3.2 Example

Table t:

mysql> select * from t;
+------+------+------+
| k1   | k2   | k3   |
+------+------+------+
| a    | A    |    1 |
| a    | A    |    2 |
| a    | B    |    1 |
| a    | B    |    3 |
| b    | A    |    1 |
| b    | A    |    4 |
| b    | B    |    1 |
| b    | B    |    5 |
+------+------+------+
8 rows in set (0.01 sec)

for the query:

SELECT k1, k2, GROUPING_ID(k1,k2), SUM(k3) FROM t GROUP BY GROUPING SETS ((k1, k2), (k1), (k2), ());

First,expand the input,every row expand into 4 rows ( the size of GROUPING SETS), and insert GROUPING_ID column

e.g. a, A, 1 expanded to:

+------+------+------+---------------+
| k1   | k2   | k3   | GROUPING_ID(k1,k2) |
+------+------+------+---------------+
| a    | A    |    1 |             0 |
| a    | NULL |    1 |             1 |
| NULL | A    |    1 |             2 |
| NULL | NULL |    1 |             3 |
+------+------+------+---------------+

Finally, all rows expended as follows (32 rows):

+------+------+------+---------------+
| k1   | k2   | k3   | GROUPING_ID(k1,k2) |
+------+------+------+---------------+
| a    | A    |    1 |             0 |
| a    | A    |    2 |             0 |
| a    | B    |    1 |             0 |
| a    | B    |    3 |             0 |
| b    | A    |    1 |             0 |
| b    | A    |    4 |             0 |
| b    | B    |    1 |             0 |
| b    | B    |    5 |             0 |
| a    | NULL |    1 |             1 |
| a    | NULL |    1 |             1 |
| a    | NULL |    2 |             1 |
| a    | NULL |    3 |             1 |
| b    | NULL |    1 |             1 |
| b    | NULL |    1 |             1 |
| b    | NULL |    4 |             1 |
| b    | NULL |    5 |             1 |
| NULL | A    |    1 |             2 |
| NULL | A    |    1 |             2 |
| NULL | A    |    2 |             2 |
| NULL | A    |    4 |             2 |
| NULL | B    |    1 |             2 |
| NULL | B    |    1 |             2 |
| NULL | B    |    3 |             2 |
| NULL | B    |    5 |             2 |
| NULL | NULL |    1 |             3 |
| NULL | NULL |    1 |             3 |
| NULL | NULL |    1 |             3 |
| NULL | NULL |    1 |             3 |
| NULL | NULL |    2 |             3 |
| NULL | NULL |    3 |             3 |
| NULL | NULL |    4 |             3 |
| NULL | NULL |    5 |             3 |
+------+------+------+---------------+
32 rows in set.

now GROUP BY k1, k2, GROUPING_ID(k1,k2):

+------+------+---------------+-----------+
| k1   | k2   | grouping_id(k1,k2) | sum(`k3`) |
+------+------+---------------+-----------+
| a    | A    |             0 |         3 |
| a    | B    |             0 |         4 |
| a    | NULL |             1 |         7 |
| b    | A    |             0 |         5 |
| b    | B    |             0 |         6 |
| b    | NULL |             1 |        11 |
| NULL | A    |             2 |         8 |
| NULL | B    |             2 |        10 |
| NULL | NULL |             3 |        18 |
+------+------+---------------+-----------+
9 rows in set (0.02 sec)

The result is equivalent to the UNION ALL

select k1, k2, sum(k3) from t group by k1, k2
UNION ALL
select NULL, k2, sum(k3) from t group by k2
UNION ALL
select k1, NULL, sum(k3) from t group by k1
UNION ALL
select NULL, NULL, sum(k3) from t;

+------+------+-----------+
| k1   | k2   | sum(`k3`) |
+------+------+-----------+
| b    | B    |         6 |
| b    | A    |         5 |
| a    | A    |         3 |
| a    | B    |         4 |
| a    | NULL |         7 |
| b    | NULL |        11 |
| NULL | B    |        10 |
| NULL | A    |         8 |
| NULL | NULL |        18 |
+------+------+-----------+
9 rows in set (0.06 sec)

3.3 FE

3.3.1 Tasks

  1. Add GroupByClause, repalce groupingExprs.
  2. Add Grouping Sets, Cube and RollUp syntax.
  3. Add GroupByClause in SelectStmt.
  4. add virtual column GROUPING__ID,and insert into groupingExprs,
  5. Add a PlanNode, name as RepeatNode. For GroupingSets aggregation insert RepeatNode to the plan.

3.3.2 Tuple

In order to add GROUPING_ID to groupingExprs in GroupByClause, need to create virtual SlotRef, also, need tot create a tuple for this slot, named GROUPING__ID Tuple.

For the plannode RepeatNode, it's input is all the tuple of it's children, It's output tuple is the repeat data and GROUPING_ID.

3.3.3 Expression and Function Substitution

expr -> if(bitand(pos, grouping_id)=0, expr, null) for expr in extension grouping clause
grouping_id() -> grouping_id(grouping_id) for grouping_id function

3.4 BE

3.4.1 Tasks

  1. Add RepeatNode executor, expend the input data and append GROUPING_ID to every row
  2. Implements grouping_id() function.
@morningman morningman added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 25, 2019
@morningman morningman added this to the 0.12.0 milestone Oct 25, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Oct 29, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Oct 31, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 7, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 7, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 7, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 11, 2019
…pache#2039)

compatible with hive grouping clause (apache#2039)

add unit test

fix group by will throw null pointer exception

Support Grouping Sets, Rollup and Cube to extend group by statement (apache#2039)

add grouping function parser

implements grouping function

fix some code style

add parameter check to  grouping_id function (apache#2039)

fix unit test failed (apache#2039)

support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 11, 2019
…pache#2039)

compatible with hive grouping clause (apache#2039)

add unit test

fix group by will throw null pointer exception

Support Grouping Sets, Rollup and Cube to extend group by statement (apache#2039)

add grouping function parser

implements grouping function

fix some code style

add parameter check to  grouping_id function (apache#2039)

fix unit test failed (apache#2039)

support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 11, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 12, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 12, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 13, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 13, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 13, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 14, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 14, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 14, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 15, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 15, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 15, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 15, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 15, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 18, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 19, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 25, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 25, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 25, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 25, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Nov 25, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)
@EmmyMiao87 EmmyMiao87 added area/sql/compatibility Issues or PRs related to the SQL compatibililty need more test Add more test labels Nov 29, 2019
yangzhg added a commit to yangzhg/doris that referenced this issue Dec 9, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)
yangzhg added a commit to yangzhg/doris that referenced this issue Dec 9, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

clean some code style

clean some code style

[FIX] fix grouping functions not substitute in subquery(apache#2039)

[ADD] add design docs

[FIX] fix expr in groupby clause remove duplicate error[apache#2039]

[FIX] fix code style problems

add some comments

[ADD] add fe unit test

[FIX] fix according comments and add syntax check in where clause

remve unused code

fix by comments
yangzhg added a commit to yangzhg/doris that referenced this issue Dec 9, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

clean some code style

clean some code style

[FIX] fix grouping functions not substitute in subquery(apache#2039)

[ADD] add design docs

[FIX] fix expr in groupby clause remove duplicate error[apache#2039]

[FIX] fix code style problems

add some comments

[ADD] add fe unit test

[FIX] fix according comments and add syntax check in where clause

remve unused code

fix by comments

fix by comments

fix by comments
yangzhg added a commit to yangzhg/doris that referenced this issue Dec 16, 2019
� This is the 1st commit message:

Support Grouping Sets, Rollup and Cube to extend group by statement (apache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

� This is the commit message apache#2:

Support Grouping Sets, Rollup and Cube to extend group by statement (apache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

clean some code style

clean some code style

[FIX] fix grouping functions not substitute in subquery(apache#2039)

[ADD] add design docs

[FIX] fix expr in groupby clause remove duplicate error[apache#2039]

[FIX] fix code style problems

add some comments

[ADD] add fe unit test

[FIX] fix according comments and add syntax check in where clause

remve unused code

fix by comments
yangzhg added a commit to yangzhg/doris that referenced this issue Dec 16, 2019
…pache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

Support Grouping Sets, Rollup and Cube to extend group by statement (apache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

clean some code style

clean some code style

[FIX] fix grouping functions not substitute in subquery(apache#2039)

[ADD] add design docs

[FIX] fix expr in groupby clause remove duplicate error[apache#2039]

[FIX] fix code style problems

add some comments

[ADD] add fe unit test

[FIX] fix according comments and add syntax check in where clause

remve unused code

fix by comments

Support Grouping Sets, Rollup and Cube to extend group by statement (apache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

clean some code style

clean some code style

[FIX] fix grouping functions not substitute in subquery(apache#2039)

[ADD] add design docs

[FIX] fix expr in groupby clause remove duplicate error[apache#2039]

[FIX] fix code style problems

add some comments

[ADD] add fe unit test

[FIX] fix according comments and add syntax check in where clause

remve unused code

fix by comments

fix by comments

fix by comments

Support Grouping Sets, Rollup and Cube to extend group by statement (apache#2039)

[ADD] compatible with hive grouping clause (apache#2039)
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (apache#2039)

[FIX] fix analyzer error in window function(apache#2039)

clean some code style

clean some code style

[FIX] fix grouping functions not substitute in subquery(apache#2039)

[ADD] add design docs

[FIX] fix expr in groupby clause remove duplicate error[apache#2039]

[FIX] fix code style problems

add some comments

[ADD] add fe unit test

[FIX] fix according comments and add syntax check in where clause

remve unused code

fix by comments

fix by comments

fix by comments

move some code to repeatnode

support subquery in grouping sets

rewrite select when using grouping set

fix some check condition
morningman pushed a commit that referenced this issue Jan 17, 2020
…ment

Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax 
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube  or rollup like 
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```

[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql/compatibility Issues or PRs related to the SQL compatibililty kind/feature Categorizes issue or PR as related to a new feature. need more test Add more test
Projects
None yet
Development

No branches or pull requests

3 participants