This repository has been archived by the owner on Mar 30, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 92
Sample Queries
hbutani edited this page Aug 4, 2016
·
11 revisions
These are against the denormalized(flattened) TPCH sales star schema as described in the Quick Start Guide page.
Example List:
SQL
select l_returnflag, l_linestatus, count(*),
sum(l_extendedprice) as s, max(ps_supplycost) as m, avg(ps_availqty) as a,count(distinct o_orderkey)
from orderLineItemPartSupplier
group by l_returnflag, l_linestatus
Logical Plan
Aggregate [l_returnflag#69,l_linestatus#70], [l_returnflag#69,l_linestatus#70,COUNT(1) AS c2#109L,SUM(l_extendedprice#66) AS s#106,MAX(ps_supplycost#81) AS m#107,AVG(CAST(ps_availqty#80, LongType)) AS a#108,COUNT(DISTINCT o_orderkey#53) AS c6#110L]
Project [l_extendedprice#66,o_orderkey#53,ps_supplycost#81,l_returnflag#69,l_linestatus#70,ps_availqty#80]
Relation[o_orderkey#53,o_custkey#54,o_orderstatus#55,...
Physical Plan
Project [l_returnflag#69,l_linestatus#70,alias-1#112L AS c2#109L,alias-2#111 AS s#106,alias-3#115 AS m#107,alias-4#113 AS a#108,alias-7#114L AS c6#110L]
PhysicalRDD [alias-2#111,alias-3#115,alias-7#114L,alias-4#113,l_returnflag#69,l_linestatus#70,alias-1#112L], DruidRDD[2] at RDD at DruidRDD.scala:16
Druid Query
{
"jsonClass" : "GroupByQuerySpec",
"queryType" : "groupBy",
"dataSource" : "tpch",
"dimensions" : [ {
"jsonClass" : "DefaultDimensionSpec",
"type" : "default",
"dimension" : "l_returnflag",
"outputName" : "l_returnflag"
}, {
"jsonClass" : "DefaultDimensionSpec",
"type" : "default",
"dimension" : "l_linestatus",
"outputName" : "l_linestatus"
} ],
"granularity" : "all",
"aggregations" : [ {
"jsonClass" : "FunctionAggregationSpec",
"type" : "count",
"name" : "alias-1",
"fieldName" : "count"
}, {
"jsonClass" : "FunctionAggregationSpec",
"type" : "doubleSum",
"name" : "alias-2",
"fieldName" : "l_extendedprice"
}, {
"jsonClass" : "FunctionAggregationSpec",
"type" : "doubleMax",
"name" : "alias-3",
"fieldName" : "ps_supplycost"
}, {
"jsonClass" : "FunctionAggregationSpec",
"type" : "longSum",
"name" : "alias-5",
"fieldName" : "sum_ps_availqty"
}, {
"jsonClass" : "FunctionAggregationSpec",
"type" : "count",
"name" : "alias-6",
"fieldName" : "count"
}, {
"jsonClass" : "CardinalityAggregationSpec",
"type" : "cardinality",
"name" : "alias-7",
"fieldNames" : [ "o_orderkey" ],
"byRow" : true
} ],
"postAggregations" : [ {
"jsonClass" : "ArithmeticPostAggregationSpec",
"type" : "arithmetic",
"name" : "alias-4",
"fn" : "/",
"fields" : [ {
"jsonClass" : "FieldAccessPostAggregationSpec",
"type" : "fieldAccess",
"fieldName" : "alias-5"
}, {
"jsonClass" : "FieldAccessPostAggregationSpec",
"type" : "fieldAccess",
"fieldName" : "alias-6"
} ]
} ],
"intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-12-31T16:00:00.000-08:00" ]
}
Rewrites:
- Grouping Expressions were translated to DefaultDimensionSpecs
- Count(*) translated to FunctionAggregationSpec(type=count)
- Sum/Min/Max translated to corresponding FunctionAggregationSpec
- Avg translated to ArithmeticPostAggregationSpec and a pair of FunctionAggregationSpecs
SQL
select f, s, count(*)
from (select l_returnflag f, l_linestatus s from orderLineItemPartSupplier) t
group by f, s
Logical Plan
Aggregate [f#122,s#123], [f#122,s#123,COUNT(1) AS c2#124L]
Project [l_returnflag#69 AS f#122,l_linestatus#70 AS s#123]
Relation[o_orderkey#53,o_custkey#54,....
Physical Plan
Project [f#122,s#123,alias-1#125L AS c2#124L]
PhysicalRDD [f#122,s#123,alias-1#125L], DruidRDD[3] at RDD at DruidRDD.scala:16
Druid Query
{
"jsonClass" : "GroupByQuerySpec",
"queryType" : "groupBy",
"dataSource" : "tpch",
"dimensions" : [ {
"jsonClass" : "DefaultDimensionSpec",
"type" : "default",
"dimension" : "l_returnflag",
"outputName" : "f"
}, {
"jsonClass" : "DefaultDimensionSpec",
"type" : "default",
"dimension" : "l_linestatus",
"outputName" : "s"
} ],
"granularity" : "all",
"aggregations" : [ {
"jsonClass" : "FunctionAggregationSpec",
"type" : "count",
"name" : "alias-1",
"fieldName" : "count"
} ],
"intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-12-31T16:00:00.000-08:00" ]
}
Rewrites:
- Similar to basicAgg query
- Projection above PhysicalRDD Operator maps Druid output to original query Attributes. Details on mapping described here
SQL logic
val shipDtPredicate = dateTime('l_shipdate) <= (dateTime("1997-12-01") - 90.day)
val df = sql(
date"""
select f, s, count(*) as count_order
from
(
select l_returnflag as f, l_linestatus as s, l_shipdate, s_region, s_nation, c_nation
from orderLineItemPartSupplier
) t
where $shipDtPredicate and ((s_nation = 'FRANCE' and c_nation = 'GERMANY') or
(c_nation = 'FRANCE' and s_nation = 'GERMANY')
)
group by f,s
order by f,s
""")
SQL
select f, s, count(*) as count_order
from
(
select l_returnflag as f, l_linestatus as s, l_shipdate, s_region, s_nation, c_nation
from orderLineItemPartSupplier
) t
where dateIsBeforeOrEqual(dateTime(`l_shipdate`),dateMinus(dateTime("1997-12-01"),period("P90D"))) and ((s_nation = 'FRANCE' and c_nation = 'GERMANY') or
(c_nation = 'FRANCE' and s_nation = 'GERMANY')
)
group by f,s
order by f,s
Logical Plan
Sort [f#113 ASC,s#114 ASC], true
Aggregate [f#113,s#114], [f#113,s#114,COUNT(1) AS count_order#106L]
Project [l_returnflag#69 AS f#113,l_linestatus#70 AS s#114]
Filter (scalaUDF(scalaUDF(l_shipdate#71),scalaUDF(scalaUDF(1997-12-01),scalaUDF(P90D))) && (((s_nation#88 = FRANCE) && (c_nation#104 = GERMANY)) || ((c_nation#104 = FRANCE) && (s_nation#88 = GERMANY))))
Relation[o_orderkey#53,o_custkey#54,o_orderstatus#55,...
Physical Plan
Sort [f#113 ASC,s#114 ASC], true
Project [f#113,s#114,alias-1#115L AS count_order#106L]
PhysicalRDD [f#113,s#114,alias-1#115L], DruidRDD[2] at RDD at DruidRDD.scala:16
Druid Query
{
"jsonClass" : "GroupByQuerySpec",
"queryType" : "groupBy",
"dataSource" : "tpch",
"dimensions" : [ {
"jsonClass" : "DefaultDimensionSpec",
"type" : "default",
"dimension" : "l_returnflag",
"outputName" : "f"
}, {
"jsonClass" : "DefaultDimensionSpec",
"type" : "default",
"dimension" : "l_linestatus",
"outputName" : "s"
} ],
"granularity" : "all",
"filter" : {
"jsonClass" : "LogicalFilterSpec",
"type" : "or",
"fields" : [ {
"jsonClass" : "LogicalFilterSpec",
"type" : "and",
"fields" : [ {
"jsonClass" : "SelectorFilterSpec",
"type" : "selector",
"dimension" : "s_nation",
"value" : "FRANCE"
}, {
"jsonClass" : "SelectorFilterSpec",
"type" : "selector",
"dimension" : "c_nation",
"value" : "GERMANY"
} ]
}, {
"jsonClass" : "LogicalFilterSpec",
"type" : "and",
"fields" : [ {
"jsonClass" : "SelectorFilterSpec",
"type" : "selector",
"dimension" : "c_nation",
"value" : "FRANCE"
}, {
"jsonClass" : "SelectorFilterSpec",
"type" : "selector",
"dimension" : "s_nation",
"value" : "GERMANY"
} ]
} ]
},
"aggregations" : [ {
"jsonClass" : "FunctionAggregationSpec",
"type" : "count",
"name" : "alias-1",
"fieldName" : "count"
} ],
"intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-09-02T01:00:00.001-07:00" ]
}
Rewrites:
- The predicate on the timeDimension column l_shipdate is converted to a Interval: "1992-12-31T16:00:00.000-08:00/1997-09-02T01:00:00.001-07:00"
- Predicates on s_nation and c_nation is translated to LogicalFilterSpec and SelectorFilterSpecs
SQL logic
val shipDtPredicate = dateTime('l_shipdate) <= (dateTime("1997-12-01") - 90.day)
val df = sql(
date"""
select s_nation, count(*) as count_order
from
(
select l_returnflag as f, l_linestatus as s, l_shipdate,
s_region, s_nation, c_nation, p_type
from orderLineItemPartSupplier
) t
where $shipDtPredicate and s_nation >= 'FRANCE'
group by s_nation
order by s_nation
""")
SQL
select s_nation, count(*) as count_order
from
(
select l_returnflag as f, l_linestatus as s, l_shipdate,
s_region, s_nation, c_nation, p_type
from orderLineItemPartSupplier
) t
where dateIsBeforeOrEqual(dateTime(`l_shipdate`),dateMinus(dateTime("1997-12-01"),period("P90D"))) and s_nation >= 'FRANCE'
group by s_nation
order by s_nation
Logical Plan
Sort [s_nation#88 ASC], true
Aggregate [s_nation#88], [s_nation#88,COUNT(1) AS count_order#106L]
Project [s_nation#88]
Filter (scalaUDF(scalaUDF(l_shipdate#71),scalaUDF(scalaUDF(1997-12-01),scalaUDF(P90D))) && (s_nation#88 >= FRANCE))
Relation[o_orderkey#53,o_custkey#54,o_orderstatus...
Physical Plan
Sort [s_nation#88 ASC], true
Project [s_nation#88,alias-1#115L AS count_order#106L]
PhysicalRDD [s_nation#88,alias-1#115L], DruidRDD[2] at RDD at DruidRDD.scala:16
Druid Query
{
"jsonClass" : "GroupByQuerySpec",
"queryType" : "groupBy",
"dataSource" : "tpch",
"dimensions" : [ {
"jsonClass" : "DefaultDimensionSpec",
"type" : "default",
"dimension" : "s_nation",
"outputName" : "s_nation"
} ],
"granularity" : "all",
"filter" : {
"jsonClass" : "JavascriptFilterSpec",
"type" : "javascript",
"dimension" : "s_nation",
"function" : "function(x) { return(x >= 'FRANCE') }"
},
"aggregations" : [ {
"jsonClass" : "FunctionAggregationSpec",
"type" : "count",
"name" : "alias-1",
"fieldName" : "count"
} ],
"intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-09-02T01:00:00.001-07:00" ]
},
"intervalSplits" : [ {
"start" : 725846400000,
"end" : 873187200001
}
Rewrites:
- The predicate
s_nation >= 'FRANCE'
is translated to a JavascriptFilterSpec
- Overview
- Quick Start
-
User Guide
- [Defining a DataSource on a Flattened Dataset](https://github.com/SparklineData/spark-druid-olap/wiki/Defining-a Druid-DataSource-on-a-Flattened-Dataset)
- Defining a Star Schema
- Sample Queries
- Approximate Count and Spatial Queries
- Druid Datasource Options
- Sparkline SQLContext Options
- Using Tableau with Sparkline
- How to debug a Query Plan?
- Running the ThriftServer with Sparklinedata components
- [Setting up multiple Sparkline ThriftServers - Load Balancing & HA] (https://github.com/SparklineData/spark-druid-olap/wiki/Setting-up-multiple-Sparkline-ThriftServers-(Load-Balancing-&-HA))
- Runtime Views
- Sparkline SQL extensions
- Sparkline Pluggable Modules
- Dev. Guide
- Reference Architectures
- Releases
- Cluster Spinup Tool
- TPCH Benchmark