Skip to content

Commit

Permalink
[SPARK-39876][SQL] Add UNPIVOT to SQL syntax
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
This adds UNPIVOT clause to SQL syntax. It follows the same syntax as  [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#unpivot_operator), [T-SQL](https://docs.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15#unpivot-example), [Oracle](https://www.oracletutorial.com/oracle-basics/oracle-unpivot/):

```
FROM ... [ unpivot_clause ]

unpivot_clause:
    UNPIVOT [ { INCLUDE | EXCLUDE } NULLS ] (
        { single_value_column_unpivot | multi_value_column_unpivot }
    ) [[AS] alias]

single_value_column_unpivot:
    values_column
    FOR name_column
    IN (unpivot_column [[AS] alias] [, ...])

multi_value_column_unpivot:
    (values_column [, ...])
    FOR name_column
    IN ((unpivot_column [, ...]) [[AS] alias] [, ...])

unpivotColumn:
    multipartIdentifier
```

For example:
```sql
CREATE TABLE sales_quarterly (year INT, q1 INT, q2 INT, q3 INT, q4 INT);
INSERT INTO sales_quarterly VALUES
    (2020, null, 1000, 2000, 2500),
    (2021, 2250, 3200, 4200, 5900),
    (2022, 4200, 3100, null, null);

SELECT * FROM sales_quarterly
    UNPIVOT (
        sales FOR quarter IN (q1, q2, q3, q4)
    );

SELECT up.* FROM sales_quarterly
    UNPIVOT INCLUDE NULLS (
        sales FOR quarter IN (q1 AS Q1, q2 AS Q2, q3 AS Q3, q4 AS Q4)
    ) AS up;

SELECT * FROM sales_quarterly
    UNPIVOT EXCLUDE NULLS (
        (first_quarter, second_quarter)
        FOR half_of_the_year IN (
            (q1, q2) AS H1,
            (q3, q4) AS H2
        )
    );
```

### Why are the changes needed?
To support `Dataset.unpivot` in SQL queries.

### Does this PR introduce _any_ user-facing change?
Yes, adds `UNPIVOT` to SQL syntax.

### How was this patch tested?
Added end-to-end tests to `SQLQueryTestSuite`.

Closes #37407 from EnricoMi/branch-sql-unpivot.

Authored-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
EnricoMi authored and cloud-fan committed Oct 7, 2022
1 parent 1c0b096 commit 29e4552
Show file tree
Hide file tree
Showing 33 changed files with 1,006 additions and 256 deletions.
12 changes: 12 additions & 0 deletions core/src/main/resources/error/error-classes.json
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,12 @@
"Unable to acquire <requestedBytes> bytes of memory, got <receivedBytes>"
]
},
"UNPIVOT_REQUIRES_ATTRIBUTES" : {
"message" : [
"UNPIVOT requires all given <given> expressions to be columns when no <empty> expressions are given. These are not columns: [<expressions>]."
],
"sqlState" : "42000"
},
"UNPIVOT_REQUIRES_VALUE_COLUMNS" : {
"message" : [
"At least one value column needs to be specified for UNPIVOT, all columns specified as ids"
Expand All @@ -564,6 +570,12 @@
],
"sqlState" : "42000"
},
"UNPIVOT_VALUE_SIZE_MISMATCH" : {
"message" : [
"All unpivot value columns must have the same size as there are value column names (<names>)"
],
"sqlState" : "42000"
},
"UNRECOGNIZED_SQL_TYPE" : {
"message" : [
"Unrecognized SQL type <typeName>"
Expand Down
6 changes: 3 additions & 3 deletions core/src/test/scala/org/apache/spark/SparkFunSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -301,15 +301,15 @@ abstract class SparkFunSuite
assert(exception.getErrorClass === errorClass)
sqlState.foreach(state => assert(exception.getSqlState === state))
val expectedParameters = exception.getMessageParameters.asScala
if (matchPVals == true) {
if (matchPVals) {
assert(expectedParameters.size === parameters.size)
expectedParameters.foreach(
exp => {
val parm = parameters.getOrElse(exp._1,
throw new IllegalArgumentException("Missing parameter" + exp._1))
if (!exp._2.matches(parm)) {
throw new IllegalArgumentException("(" + exp._1 + ", " + exp._2 +
") does not match: " + parm)
throw new IllegalArgumentException("For parameter '" + exp._1 + "' value '" + exp._2 +
"' does not match: " + parm)
}
}
)
Expand Down
3 changes: 3 additions & 0 deletions docs/sql-ref-ansi-compliance.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,7 @@ Below is a list of all the keywords in Spark SQL.
|ESCAPED|non-reserved|non-reserved|non-reserved|
|EXCEPT|reserved|strict-non-reserved|reserved|
|EXCHANGE|non-reserved|non-reserved|non-reserved|
|EXCLUDE|non-reserved|non-reserved|non-reserved|
|EXISTS|non-reserved|non-reserved|reserved|
|EXPLAIN|non-reserved|non-reserved|non-reserved|
|EXPORT|non-reserved|non-reserved|non-reserved|
Expand Down Expand Up @@ -459,6 +460,7 @@ Below is a list of all the keywords in Spark SQL.
|IGNORE|non-reserved|non-reserved|non-reserved|
|IMPORT|non-reserved|non-reserved|non-reserved|
|IN|reserved|non-reserved|reserved|
|INCLUDE|non-reserved|non-reserved|non-reserved|
|INDEX|non-reserved|non-reserved|non-reserved|
|INDEXES|non-reserved|non-reserved|non-reserved|
|INNER|reserved|strict-non-reserved|reserved|
Expand Down Expand Up @@ -616,6 +618,7 @@ Below is a list of all the keywords in Spark SQL.
|UNIQUE|reserved|non-reserved|reserved|
|UNKNOWN|reserved|non-reserved|reserved|
|UNLOCK|non-reserved|non-reserved|non-reserved|
|UNPIVOT|non-reserved|non-reserved|non-reserved|
|UNSET|non-reserved|non-reserved|non-reserved|
|UPDATE|non-reserved|non-reserved|reserved|
|USE|non-reserved|non-reserved|non-reserved|
Expand Down
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-case.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,5 @@ SELECT * FROM person
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-clusterby.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,5 @@ SELECT age, name FROM person CLUSTER BY age;
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-distribute-by.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,4 +96,5 @@ SELECT age, name FROM person DISTRIBUTE BY age;
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-groupby.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,4 +316,5 @@ SELECT FIRST(age IGNORE NULLS), LAST(id), SUM(id) FROM person;
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-having.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,4 +127,5 @@ SELECT sum(quantity) AS sum FROM dealer HAVING sum(quantity) > 10;
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-lateral-view.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,4 @@ SELECT * FROM person
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-limit.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,5 @@ org.apache.spark.sql.AnalysisException: The limit expression must evaluate to a
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-orderby.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,4 +145,5 @@ SELECT * FROM person ORDER BY name ASC, age DESC;
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-pivot.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,4 +98,5 @@ SELECT * FROM person
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-sortby.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,4 +178,5 @@ SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-transform.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,4 +263,5 @@ WHERE zip_code > 94500;
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
143 changes: 143 additions & 0 deletions docs/sql-ref-syntax-qry-select-unpivot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
layout: global
title: UNPIVOT Clause
displayTitle: UNPIVOT Clause
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---

### Description

The `UNPIVOT` clause transforms multiple columns into multiple rows used in `SELECT` clause.
The `UNPIVOT` clause can be specified after the table name or subquery.

### Syntax

```sql
UNPIVOT [ { INCLUDE | EXCLUDE } NULLS ] (
{ single_value_column_unpivot | multi_value_column_unpivot }
) [[AS] alias]

single_value_column_unpivot:
values_column
FOR name_column
IN (unpivot_column [[AS] alias] [, ...])

multi_value_column_unpivot:
(values_column [, ...])
FOR name_column
IN ((unpivot_column [, ...]) [[AS] alias] [, ...])
```

### Parameters

* **unpivot_column**

Contains columns in the `FROM` clause, which specifies the columns we want to unpivot.

* **name_column**

The name for the column that holds the names of the unpivoted columns.

* **values_column**

The name for the column that holds the values of the unpivoted columns.

### Examples

```sql
CREATE TABLE sales_quarterly (year INT, q1 INT, q2 INT, q3 INT, q4 INT);
INSERT INTO sales_quarterly VALUES
(2020, null, 1000, 2000, 2500),
(2021, 2250, 3200, 4200, 5900),
(2022, 4200, 3100, null, null);

-- column names are used as unpivot columns
SELECT * FROM sales_quarterly
UNPIVOT (
sales FOR quarter IN (q1, q2, q3, q4)
);
+------+---------+-------+
| year | quarter | sales |
+------+---------+-------+
| 2020 | q2 | 1000 |
| 2020 | q3 | 2000 |
| 2020 | q4 | 2500 |
| 2021 | q1 | 2250 |
| 2021 | q2 | 3200 |
| 2021 | q3 | 4200 |
| 2021 | q4 | 5900 |
| 2022 | q1 | 4200 |
| 2022 | q2 | 3100 |
+------+---------+-------+

-- NULL values are excluded by default, they can be included
-- unpivot columns can be alias
-- unpivot result can be referenced via its alias
SELECT up.* FROM sales_quarterly
UNPIVOT INCLUDE NULLS (
sales FOR quarter IN (q1 AS Q1, q2 AS Q2, q3 AS Q3, q4 AS Q4)
) AS up;
+------+---------+-------+
| year | quarter | sales |
+------+---------+-------+
| 2020 | Q1 | NULL |
| 2020 | Q2 | 1000 |
| 2020 | Q3 | 2000 |
| 2020 | Q4 | 2500 |
| 2021 | Q1 | 2250 |
| 2021 | Q2 | 3200 |
| 2021 | Q3 | 4200 |
| 2021 | Q4 | 5900 |
| 2022 | Q1 | 4200 |
| 2022 | Q2 | 3100 |
| 2022 | Q3 | NULL |
| 2022 | Q4 | NULL |
+------+---------+-------+

-- multiple value columns can be unpivoted per row
SELECT * FROM sales_quarterly
UNPIVOT EXCLUDE NULLS (
(first_quarter, second_quarter)
FOR half_of_the_year IN (
(q1, q2) AS H1,
(q3, q4) AS H2
)
);
+------+------------------+---------------+----------------+
| id | half_of_the_year | first_quarter | second_quarter |
+------+------------------+---------------+----------------+
| 2020 | H1 | NULL | 1000 |
| 2020 | H2 | 2000 | 2500 |
| 2021 | H1 | 2250 | 3200 |
| 2021 | H2 | 4200 | 5900 |
| 2022 | H1 | 4200 | 3100 |
+------+------------------+---------------+----------------+
```

### Related Statements

* [SELECT Main](sql-ref-syntax-qry-select.html)
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax-qry-select-where.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,4 +127,5 @@ SELECT * FROM person AS parent
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
8 changes: 7 additions & 1 deletion docs/sql-ref-syntax-qry-select.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ While `select_statement` is defined as
SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...) ] }
FROM { from_item [ , ... ] }
[ PIVOT clause ]
[ UNPIVOT clause ]
[ LATERAL VIEW clause ] [ ... ]
[ WHERE boolean_expression ]
[ GROUP BY expression [ , ... ] ]
Expand Down Expand Up @@ -75,7 +76,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_

An expression with an assigned name. In general, it denotes a column expression.

**Syntax:** `expression [AS] [alias]`
**Syntax:** `expression [[AS] alias]`

* **from_item**

Expand All @@ -91,6 +92,10 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_

The `PIVOT` clause is used for data perspective; We can get the aggregated values based on specific column value.

* **UNPIVOT**

The `UNPIVOT` clause transforms columns into rows. It is the reverse of `PIVOT`, except for aggregation of values.

* **LATERAL VIEW**

The `LATERAL VIEW` clause is used in conjunction with generator functions such as `EXPLODE`, which will generate a virtual table containing one or more rows. `LATERAL VIEW` will apply the rows to each original output row.
Expand Down Expand Up @@ -190,6 +195,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_
* [Window Function](sql-ref-syntax-qry-select-window.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
* [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
* [LATERAL Subquery](sql-ref-syntax-qry-select-lateral-subquery.html)
1 change: 1 addition & 0 deletions docs/sql-ref-syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ ability to generate logical and physical plan for a given query using
* [Window Function](sql-ref-syntax-qry-select-window.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
* [LATERAL SUBQUERY](sql-ref-syntax-qry-select-lateral-subquery.html)
* [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
Expand Down
Loading

0 comments on commit 29e4552

Please sign in to comment.