Skip to content

Commit

Permalink
[FIX] fix analyzer error in window function(apache#2039)
Browse files Browse the repository at this point in the history
  • Loading branch information
yangzhg committed Nov 25, 2019
1 parent c7d52af commit b79cb65
Show file tree
Hide file tree
Showing 17 changed files with 574 additions and 303 deletions.
1 change: 1 addition & 0 deletions be/src/exec/repeat_node.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ Status RepeatNode::get_repeated_batch(

for(size_t slot_idx = 0; slot_idx < _grouping_list.size(); ++slot_idx) {
int64_t val = _grouping_list[slot_idx][repeat_id_idx];
DCHECK_LT(slot_idx, _tuple_desc->slots().size()) << "TupleDescriptor: " << _tuple_desc->debug_string();
const SlotDescriptor *slot_desc = _tuple_desc->slots()[slot_idx];
tuple->set_not_null(slot_desc->null_indicator_offset());
RawValue::write(&val, tuple, slot_desc, tuple_pool);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# GROUP BY

## description

GROUP BY `GROUPING SETS``CUBE``ROLLUP` 是对 GROUP BY 子句的扩展,它能够在一个 GROUP BY 子句中实现多个集合的分组的聚合。其结果等价于将多个相应 GROUP BY 子句进行 UNION 操作。

GROUP BY 子句是只含有一个元素的 GROUP BY GROUPING SETS 的特例。
例如,GROUPING SETS 语句:

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```

其查询结果等价于:

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b
UNION
SELECT a, null, SUM( c ) FROM tab1 GROUP BY a
UNION
SELECT null, b, SUM( c ) FROM tab1 GROUP BY b
UNION
SELECT null, null, SUM( c ) FROM tab1
```

`GROUPING(expr)` 指示一个列是否为聚合列,如果是聚合列为0,否则为1

`GROUPING_ID(expr [ , expr [ , ... ] ])` 与GROUPING 类似, GROUPING_ID根据指定的column 顺序,计算出一个列列表的 bitmap 值,每一位为GROUPING的值. GROUPING_ID()函数返回位向量的十进制值。

### Syntax

```
SELECT ...
FROM ...
[ ... ]
GROUP BY [
, ... |
GROUPING SETS [, ...] ( groupSet [ , groupSet [ , ... ] ] ) |
ROLLUP(expr [ , expr [ , ... ] ]) |
expr [ , expr [ , ... ] ] WITH ROLLUP |
CUBE(expr [ , expr [ , ... ] ]) |
expr [ , expr [ , ... ] ] WITH CUBE
]
[ ... ]
```

### Parameters

`groupSet` 表示 select list 中的列,别名或者表达式组成的集合 `groupSet ::= { ( expr [ , expr [ , ... ] ] )}`

`expr` 表示 select list 中的列,别名或者表达式

### Note

doris 支持两种语法,类似PostgreSQL 语法和 类似hive 语法,这两种语法实例如下

类 PostgreSQL 语法:

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP(a,b,c)
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY CUBE(a,b,c)
```

类似hive 语法

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY a,b GROUPING SETS ( (a, b), (a), (b), ( ) );
SELECT a, b, c, SUM( d ) FROM tab1 GROUP BY a,b,c WITH ROLLUP
SELECT a, b, c, SUM( d ) FROM tab1 GROUP BY a,b,c WITH CUBE
```

`ROLLUP(a,b,c)` 等价于如下`GROUPING SETS` 语句

```
GROUPING SETS (
(a,b,c),
( a, b ),
( a),
( )
)
```

`CUBE ( a, b, c )` 等价于如下`GROUPING SETS` 语句

```
GROUPING SETS (
( a, b, c ),
( a, b ),
( a, c ),
( a ),
( b, c ),
( b ),
( c ),
( )
)
```

## example

下面是一个实际数据的例子

```
> SELECT * FROM t;
+------+------+------+
| k1 | k2 | k3 |
+------+------+------+
| a | A | 1 |
| a | A | 2 |
| a | B | 1 |
| a | B | 3 |
| b | A | 1 |
| b | A | 4 |
| b | B | 1 |
| b | B | 5 |
+------+------+------+
8 rows in set (0.01 sec)
> SELECT k1, k2, SUM(k3) FROM t GROUP BY GROUPING SETS ( (k1, k2), (k2), (k1), ( ) );
+------+------+-----------+
| k1 | k2 | sum(`k3`) |
+------+------+-----------+
| b | B | 6 |
| a | B | 4 |
| a | A | 3 |
| b | A | 5 |
| NULL | B | 10 |
| NULL | A | 8 |
| a | NULL | 7 |
| b | NULL | 11 |
| NULL | NULL | 18 |
+------+------+-----------+
9 rows in set (0.06 sec)
> SELECT k1, k2, GROUPING_ID(k1,k2), SUM(k3) FROM t GROUP BY GROUPING SETS ((k1, k2), (k1), (k2), ());
+------+------+---------------+----------------+
| k1 | k2 | grouping_id(k1,k2) | sum(`k3`) |
+------+------+---------------+----------------+
| a | A | 0 | 3 |
| a | B | 0 | 4 |
| a | NULL | 1 | 7 |
| b | A | 0 | 5 |
| b | B | 0 | 6 |
| b | NULL | 1 | 11 |
| NULL | A | 2 | 8 |
| NULL | B | 2 | 10 |
| NULL | NULL | 3 | 18 |
+------+------+---------------+----------------+
9 rows in set (0.02 sec)
```

## keyword

GROUP, GROUPING, GROUPING_ID, GROUPING_SETS, GROUPING SETS, CUBE, ROLLUP
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ under the License.
-->

# INSERT

## description

### Syntax

```
Expand Down Expand Up @@ -47,7 +49,7 @@ INSERT INTO table_name
> query: 一个普通查询,查询的结果会写入到目标中
>
> hint: 用于指示 `INSERT` 执行行为的一些指示符。`streaming` 和 默认的非 `streaming` 方式均会使用同步方式完成 `INSERT` 语句执行
> `streaming` 方式在执行完成后会返回一个 label 方便用户通过 `SHOW LOAD` 查询导入的状态
> `streaming` 方式在执行完成后会返回一个 label 方便用户通过 `SHOW LOAD` 查询导入的状态
### Note

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# GROUP BY

## description

GROUP BY `GROUPING SETS``CUBE``ROLLUP` is an extension to GROUP BY clause. This syntax lets you define multiple groupings in the same query. GROUPING SETS produce a single result set that is equivalent to a UNION ALL of differently grouped rows
For example GROUPING SETS clause:

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```

This statement is equivalent to:

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b
UNION
SELECT a, null, SUM( c ) FROM tab1 GROUP BY a
UNION
SELECT null, b, SUM( c ) FROM tab1 GROUP BY b
UNION
SELECT null, null, SUM( c ) FROM tab1
```

`GROUPING(expr)` indicates whether a specified column expression in a GROUP BY list is aggregated or not. GROUPING returns 1 for aggregated or 0 for not aggregated in the result set.

`GROUPING_ID(expr [ , expr [ , ... ] ])` describes which of a list of expressions are grouped in a row produced by a GROUP BY query. The GROUPING_ID function simply returns the decimal equivalent of the binary value formed as a result of the concatenation of the values returned by the GROUPING functions.

### Syntax

```
SELECT ...
FROM ...
[ ... ]
GROUP BY [
, ... |
GROUPING SETS [, ...] ( groupSet [ , groupSet [ , ... ] ] ) |
ROLLUP(expr [ , expr [ , ... ] ]) |
expr [ , expr [ , ... ] ] WITH ROLLUP |
CUBE(expr [ , expr [ , ... ] ]) |
expr [ , expr [ , ... ] ] WITH CUBE
]
[ ... ]
```

### Parameters

`groupSet` is a set of expression or column or it's alias appearing in the query block’s SELECT list. `groupSet ::= { ( expr [ , expr [ , ... ] ] )}`

`expr` is expression or column or it's alias appearing in the query block’s SELECT list.

### Note

doris support two style of syntax, PostgreSQL like and hive like, for example:
PostgreSQL like syntax:

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP(a,b,c)
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY CUBE(a,b,c)
```

hive like syntax:

```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY a,b GROUPING SETS ( (a, b), (a), (b), ( ) );
SELECT a, b, c, SUM( d ) FROM tab1 GROUP BY a,b,c WITH ROLLUP
SELECT a, b, c, SUM( d ) FROM tab1 GROUP BY a,b,c WITH CUBE
```

`ROLLUP(a,b,c)` is equivalent to `GROUPING SETS` as follows:

```
GROUPING SETS (
(a,b,c),
( a, b ),
( a),
( )
)
```

`CUBE ( a, b, c )` is equivalent to `GROUPING SETS` as follows:

```
GROUPING SETS (
( a, b, c ),
( a, b ),
( a, c ),
( a ),
( b, c ),
( b ),
( c ),
( )
)
```

## example

This is a simple example

```
> SELECT * FROM t;
+------+------+------+
| k1 | k2 | k3 |
+------+------+------+
| a | A | 1 |
| a | A | 2 |
| a | B | 1 |
| a | B | 3 |
| b | A | 1 |
| b | A | 4 |
| b | B | 1 |
| b | B | 5 |
+------+------+------+
8 rows in set (0.01 sec)
> SELECT k1, k2, SUM(k3) FROM t GROUP BY GROUPING SETS ( (k1, k2), (k2), (k1), ( ) );
+------+------+-----------+
| k1 | k2 | sum(`k3`) |
+------+------+-----------+
| b | B | 6 |
| a | B | 4 |
| a | A | 3 |
| b | A | 5 |
| NULL | B | 10 |
| NULL | A | 8 |
| a | NULL | 7 |
| b | NULL | 11 |
| NULL | NULL | 18 |
+------+------+-----------+
9 rows in set (0.06 sec)
> SELECT k1, k2, GROUPING_ID(k1,k2), SUM(k3) FROM t GROUP BY GROUPING SETS ((k1, k2), (k1), (k2), ());
+------+------+---------------+----------------+
| k1 | k2 | grouping_id(k1,k2) | sum(`k3`) |
+------+------+---------------+----------------+
| a | A | 0 | 3 |
| a | B | 0 | 4 |
| a | NULL | 1 | 7 |
| b | A | 0 | 5 |
| b | B | 0 | 6 |
| b | NULL | 1 | 11 |
| NULL | A | 2 | 8 |
| NULL | B | 2 | 10 |
| NULL | NULL | 3 | 18 |
+------+------+---------------+----------------+
9 rows in set (0.02 sec)
```

## keyword

GROUP, GROUPING, GROUPING_ID, GROUPING_SETS, GROUPING SETS, CUBE, ROLLUP
Loading

0 comments on commit b79cb65

Please sign in to comment.