-
Notifications
You must be signed in to change notification settings - Fork 92
Introducing Logical Optimizer
In the upcoming release of SparklineData, we are introducing logical query optimizer. This improves OLAP query latency and reduces computational resources needed for query execution. OLAP queries, especially those generated by tools like Tableau, Microstrategy etc, can be complex and is often not optimized. Spark Catalyst framework performs some optimizations but these are limited and often does not optimize OLAP queries. Furthemore OLAP indexing technology introduces some additional invariants like non nullability of metrics and primary partition key.
As part of this change we now push Group By below Join. Often OLAP queries would include pattern that finds min/max of a table and then self joins (Cross Product) that with the table, which is then further aggregated. By Pushing secondary GB below Join we can improve the performance of join significantly.
The following query is one such sample query:
select c_name a, sum(l_quantity) b, mi c, ma d
from (select r1.c_name, r1.l_quantity, r2.mi, r2.ma
from orderLineItemPartSupplier r1
join (select min(l_quantity)mi, max(l_quantity) ma, count(1)
from orderLineItemPartSupplier
where not(l_shipdate is null) having count(1) > 0) r2
) r3
where not (c_name='NA')
group by c_name, mi, ma
order by a, b, c, d
The above query can be rewritten as
select a, b, mi c, ma d
from (select c_name a, sum(l_quantity) b
from orderLineItemPartSupplier
where not (c_name='NA')
group by c_name) r1
join
(select min(l_quantity)mi, max(l_quantity) ma, count(1)
from orderLineItemPartSupplier
where not(l_shipdate is null) having count(1) > 0
) r2
order by a, b, c, d
- Overview
- Quick Start
-
User Guide
- [Defining a DataSource on a Flattened Dataset](https://github.com/SparklineData/spark-druid-olap/wiki/Defining-a Druid-DataSource-on-a-Flattened-Dataset)
- Defining a Star Schema
- Sample Queries
- Approximate Count and Spatial Queries
- Druid Datasource Options
- Sparkline SQLContext Options
- Using Tableau with Sparkline
- How to debug a Query Plan?
- Running the ThriftServer with Sparklinedata components
- [Setting up multiple Sparkline ThriftServers - Load Balancing & HA] (https://github.com/SparklineData/spark-druid-olap/wiki/Setting-up-multiple-Sparkline-ThriftServers-(Load-Balancing-&-HA))
- Runtime Views
- Sparkline SQL extensions
- Sparkline Pluggable Modules
- Dev. Guide
- Reference Architectures
- Releases
- Cluster Spinup Tool
- TPCH Benchmark