Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc updates for union datasources. #11103

Merged
merged 1 commit into from
Apr 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions docs/querying/datasource.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,28 +113,45 @@ use table datasources.
### `union`

<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
```sql
SELECT column1, column2
FROM (
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2
UNION ALL
SELECT column1, column2 FROM table3
)
```
<!--Native-->
```json
{
"queryType": "scan",
"dataSource": {
"type": "union",
"dataSources": ["<tableDataSourceName1>", "<tableDataSourceName2>", "<tableDataSourceName3>"]
"dataSources": ["table1", "table2", "table3"]
},
"columns": ["column1", "column2"],
"intervals": ["0000/3000"]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->

Union datasources allow you to treat two or more table datasources as a single datasource. The datasources being unioned
do not need to have identical schemas. If they do not fully match up, then columns that exist in one table but not
another will be treated as if they contained all null values in the tables where they do not exist.
Unions allow you to treat two or more tables as a single datasource. In SQL, this is done with the UNION ALL operator
applied directly to tables, called a ["table-level union"](sql.md#table-level). In native queries, this is done with a
"union" datasource.

With SQL [table-level unions](sql.md#table-level) the same columns must be selected from each table in the same order,
and those columns must either have the same types, or types that can be implicitly cast to each other (such as different
numeric types). For this reason, it is more robust to write your queries to select specific columns.

The list of "dataSources" must be nonempty. If you want to query an empty dataset, use an [`inline` datasource](#inline)
instead.
With the native union datasource, the tables being unioned do not need to have identical schemas. If they do not fully
match up, then columns that exist in one table but not another will be treated as if they contained all null values in
the tables where they do not exist.

Union datasources are not available in Druid SQL.
In either case, features like expressions, column aliasing, JOIN, GROUP BY, ORDER BY, and so on cannot be used with
table unions.

Refer to the [Query execution](query-execution.md#union) page for more details on how queries are executed when you
use union datasources.
Expand Down
15 changes: 8 additions & 7 deletions docs/querying/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,8 @@ top-level and table-level. Queries that use UNION ALL in any other way will not
#### Top-level

UNION ALL can be used at the very top outer layer of a SQL query (not in a subquery, and not in the FROM clause). In
this case, the underlying queries will be run separately, back to back, and their results will all be returned in
one result set.
this case, the underlying queries will be run separately, back to back. Their results will be concatenated together
and appear one after the other.

For example:

Expand All @@ -164,14 +164,15 @@ UNION ALL
SELECT COUNT(*) FROM tbl WHERE my_column = 'value2'
```

When UNION ALL occurs at the top level of a query like this, the results from the unioned queries are concatenated
together and appear one after the other.
With top-level UNION ALL, no further processing can be done after the UNION ALL. For example, the results of the
UNION ALL cannot have GROUP BY, ORDER BY, or any other operators applied to them.

#### Table-level

UNION ALL can be used to query multiple tables at the same time. In this case, it must appear in a subquery in the
FROM clause, and the lower-level subqueries that are inputs to the UNION ALL operator must be simple table SELECTs
(no expressions, column aliasing, etc). The query will run natively using a [union datasource](datasource.md#union).
FROM clause, and the lower-level subqueries that are inputs to the UNION ALL operator must be simple table SELECTs.
Features like expressions, column aliasing, JOIN, GROUP BY, ORDER BY, and so on cannot be used. The query will run
natively using a [union datasource](datasource.md#union).

The same columns must be selected from each table in the same order, and those columns must either have the same types,
or types that can be implicitly cast to each other (such as different numeric types). For this reason, it is generally
Expand All @@ -190,7 +191,7 @@ FROM (
GROUP BY col1
```

When UNION ALL occurs at the table level, the rows from the unioned tables are not guaranteed to be processed in
With table-level UNION ALL, the rows from the unioned tables are not guaranteed to be processed in
any particular order. They may be processed in an interleaved fashion. If you need a particular result ordering,
use [ORDER BY](#order-by) on the outer query.

Expand Down