Allow merging consecutive 'long' values in SortedRangeSet #3316

rzeyde-varada · 2020-04-02T18:34:50Z

This can be useful when pushing down TupleDomains generated by dynamic filtering.

presto-spi/src/main/java/io/prestosql/spi/predicate/Marker.java

presto-spi/src/main/java/io/prestosql/spi/predicate/SortedRangeSet.java

rzeyde-varada · 2020-04-04T06:03:19Z

Two of the TestTpcdsCostBasedPlan tests failed, due to a join order flip (log): q46 and q68.

Seems that changing the ("date_dim"."d_year" IN (1999 , (1999 + 1) , (1999 + 2))) predicate to ("date_dim"."d_year" >= 1999 AND "date_dim"."d_year" <= 1999 + 2) causes the plan change.

rzeyde-varada · 2020-04-04T14:06:51Z

It also seems that statistics estimation is different after rewriting the query:

presto:tiny> EXPLAIN ANALYZE SELECT d_date_sk FROM date_dim WHERE d_year BETWEEN 1999 AND 2001;
                                                                                               Query Plan                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Fragment 1 [SOURCE]                                                                                                                                                                                    
     CPU: 497.52ms, Scheduled: 527.01ms, Input: 73049 rows (0B); per task: avg.: 73049.00 std.dev.: 0.00, Output: 1096 rows (9.63kB)                                                                    
     Output layout: [d_date_sk]                                                                                                                                                                         
     Output partitioning: SINGLE []                                                                                                                                                                     
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                      
     ScanFilterProject[table = tpcds:date_dim:sf0.01, grouped = false, filterPredicate = ("d_year" BETWEEN 1999 AND 2001)]                                                                              
         Layout: [d_date_sk:bigint]                                                                                                                                                                     
         Estimates: {rows: 73049 (642.03kB), cpu: 998.72k, memory: 0B, network: 0B}/{rows: 730 (6.42kB), cpu: 1.95M, memory: 0B, network: 0B}/{rows: 730 (6.42kB), cpu: 1.96M, memory: 0B, network: 0B} 
         CPU: 497.00ms (100.00%), Scheduled: 526.00ms (100.00%), Output: 1096 rows (9.63kB)                                                                                                             
         Input avg.: 18262.25 rows, Input std.dev.: 173.21%                                                                                                                                             
         d_date_sk := tpcds:d_date_sk                                                                                                                                                                   
         d_year := tpcds:d_year                                                                                                                                                                         
         Input: 73049 rows (0B), Filtered: 98.50%                                                                                                                                                       


presto:tiny> EXPLAIN ANALYZE SELECT d_date_sk FROM date_dim WHERE d_year IN (1999, 2000, 2001);
                                                                                                Query Plan                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Fragment 1 [SOURCE]                                                                                                                                                                                      
     CPU: 591.47ms, Scheduled: 682.71ms, Input: 73049 rows (0B); per task: avg.: 73049.00 std.dev.: 0.00, Output: 1096 rows (9.63kB)                                                                      
     Output layout: [d_date_sk]                                                                                                                                                                           
     Output partitioning: SINGLE []                                                                                                                                                                       
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                        
     ScanFilterProject[table = tpcds:date_dim:sf0.01, grouped = false, filterPredicate = ("d_year" IN (1999, 2000, 2001))]                                                                                
         Layout: [d_date_sk:bigint]                                                                                                                                                                       
         Estimates: {rows: 73049 (642.03kB), cpu: 998.72k, memory: 0B, network: 0B}/{rows: 1090 (9.58kB), cpu: 1.95M, memory: 0B, network: 0B}/{rows: 1090 (9.58kB), cpu: 1.96M, memory: 0B, network: 0B} 
         CPU: 591.00ms (100.00%), Scheduled: 682.00ms (100.00%), Output: 1096 rows (9.63kB)                                                                                                               
         Input avg.: 18262.25 rows, Input std.dev.: 173.21%                                                                                                                                               
         d_date_sk := tpcds:d_date_sk                                                                                                                                                                     
         d_year := tpcds:d_year                                                                                                                                                                           
         Input: 73049 rows (0B), Filtered: 98.50%

For small IN predicates, the range estimation may under-estimate the actual statistics:

presto:tiny> EXPLAIN SELECT d_year FROM date_dim WHERE d_year >= 2000 AND d_year <= 2001;
                                                                    Query Plan                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Output[d_year]                                                                                                                                   
 │   Layout: [d_year:integer]                                                                                                                     
 │   Estimates: {rows: 365 (1.78kB), cpu: 713.37k, memory: 0B, network: 1.78kB}                                                                   
 └─ RemoteExchange[GATHER]                                                                                                                        
    │   Layout: [d_year:integer]                                                                                                                  
    │   Estimates: {rows: 365 (1.78kB), cpu: 713.37k, memory: 0B, network: 1.78kB}                                                                
    └─ ScanFilter[table = tpcds:date_dim:sf0.01, filterPredicate = ("d_year" BETWEEN 2000 AND 2001)]                                              
           Layout: [d_year:integer]                                                                                                               
           Estimates: {rows: 73049 (356.68kB), cpu: 356.68k, memory: 0B, network: 0B}/{rows: 365 (1.78kB), cpu: 713.37k, memory: 0B, network: 0B} 
           d_year := tpcds:d_year                                                                                                                 

presto:tiny> EXPLAIN SELECT d_year FROM date_dim WHERE d_year IN (2000, 2001);
                                                                    Query Plan                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Output[d_year]                                                                                                                                   
 │   Layout: [d_year:integer]                                                                                                                     
 │   Estimates: {rows: 727 (3.55kB), cpu: 713.37k, memory: 0B, network: 3.55kB}                                                                   
 └─ RemoteExchange[GATHER]                                                                                                                        
    │   Layout: [d_year:integer]                                                                                                                  
    │   Estimates: {rows: 727 (3.55kB), cpu: 713.37k, memory: 0B, network: 3.55kB}                                                                
    └─ ScanFilter[table = tpcds:date_dim:sf0.01, filterPredicate = ("d_year" IN (2000, 2001))]                                                    
           Layout: [d_year:integer]                                                                                                               
           Estimates: {rows: 73049 (356.68kB), cpu: 356.68k, memory: 0B, network: 0B}/{rows: 727 (3.55kB), cpu: 713.37k, memory: 0B, network: 0B} 
           d_year := tpcds:d_year

martint · 2020-04-05T03:19:46Z

Two of the TestTpcdsCostBasedPlan tests failed, due to a join order flip (log): q46 and q68.

I haven't looked at the plans, but is the new plan "better"?

martint · 2020-04-05T03:21:41Z

Seems that changing the ("date_dim"."d_year" IN (1999 , (1999 + 1) , (1999 + 2))) predicate to ("date_dim"."d_year" >= 1999 AND "date_dim"."d_year" <= 1999 + 2) causes the plan change.

That seems like a bug (or inconsistency) in the cost estimator. For integer types, it's clearly equivalent.

findepi · 2020-04-05T15:00:42Z

Seems that changing the ("date_dim"."d_year" IN (1999 , (1999 + 1) , (1999 + 2))) predicate to ("date_dim"."d_year" >= 1999 AND "date_dim"."d_year" <= 1999 + 2) causes the plan change.

That seems like a bug (or inconsistency) in the cost estimator. For integer types, it's clearly equivalent.

@martint there is no logic impl'd which could realize these are equivalent.
So, while indeed equivalent, they will be hit different code paths today.

Of course, something to improve, so the obvious question would be -- how big of a problem this is, how urgent.

sopel39 · 2020-04-06T08:23:29Z

Given that plans changes, we probably should benchmark this PR

rzeyde-varada · 2020-04-08T13:53:23Z

I haven't looked at the plans, but is the new plan "better"?

I have ran the query on sf1000 and it seems that the new plan should be better, since the build side table is smaller than the probe side.

The "build side" on q46 (results in 431k rows, ~25MB):

SELECT
  "ss_ticket_number"
, "ss_customer_sk"
, "ca_city" "bought_city"
, "sum"("ss_coupon_amt") "amt"
, "sum"("ss_net_profit") "profit"
FROM
  store_sales
, date_dim
, store
, household_demographics
, customer_address
WHERE ("store_sales"."ss_sold_date_sk" = "date_dim"."d_date_sk")
   AND ("store_sales"."ss_store_sk" = "store"."s_store_sk")
   AND ("store_sales"."ss_hdemo_sk" = "household_demographics"."hd_demo_sk")
   AND ("store_sales"."ss_addr_sk" = "customer_address"."ca_address_sk")
   AND (("household_demographics"."hd_dep_count" = 4)
      OR ("household_demographics"."hd_vehicle_count" = 3))
   AND ("date_dim"."d_dow" IN (6   , 0))
   AND ("date_dim"."d_year" IN (1999   , (1999 + 1)   , (1999 + 2)))
   AND ("store"."s_city" IN ('Fairview'   , 'Midway'   , 'Fairview'   , 'Fairview'   , 'Fairview'))
GROUP BY "ss_ticket_number", "ss_customer_sk", "ss_addr_sk", "ca_city";

The build side on q68 (results in 98k rows, ~7.3MB):

SELECT
  "ss_ticket_number"
, "ss_customer_sk"
, "ca_city" "bought_city"
, "sum"("ss_ext_sales_price") "extended_price"
, "sum"("ss_ext_list_price") "list_price"
, "sum"("ss_ext_tax") "extended_tax"
FROM
  store_sales
, date_dim
, store
, household_demographics
, customer_address
WHERE ("store_sales"."ss_sold_date_sk" = "date_dim"."d_date_sk")
   AND ("store_sales"."ss_store_sk" = "store"."s_store_sk")
   AND ("store_sales"."ss_hdemo_sk" = "household_demographics"."hd_demo_sk")
   AND ("store_sales"."ss_addr_sk" = "customer_address"."ca_address_sk")
   AND ("date_dim"."d_dom" BETWEEN 1 AND 2)
   AND (("household_demographics"."hd_dep_count" = 4)
      OR ("household_demographics"."hd_vehicle_count" = 3))
   AND ("date_dim"."d_year" IN (1999   , (1999 + 1)   , (1999 + 2)))
   AND ("store"."s_city" IN ('Midway'   , 'Fairview'));

The "probe side" in both queries (results in 12M rows, ~403MB):

SELECT
  "c_last_name"
, "c_first_name"
, "ca_city"
FROM
  customer
, customer_address
WHERE ("c_current_addr_sk" = "ca_address_sk");

rzeyde-varada · 2020-05-03T10:14:47Z

Squashed and rebased over the latest master.

sopel39 · 2020-05-31T18:22:18Z

@martint is it sill LGTM? I wanted to run benchmarks on this

martint · 2020-05-31T18:37:19Z

Yup, please do

sopel39 · 2020-06-10T11:22:54Z

Benchmarks.
Benchmarks merge.pdf
I'm reruning them as it seems odd that merging causes CPU regression in so many queries.

However, I see two problems with this PR:

Filter rules based on range are less accurate than ones based on discrete values (e.g because for discrete values we use NDVs). This can and should be fixed for int/long types
Readers (e.g ORC reader: TupleDomainOrcPredicate#extractDiscreteValues) use discrete values to filter using Bloom filters. This should be fixed and I think this could be causing observed CPU regressions

rzeyde-varada · 2020-06-11T07:03:33Z

Many thanks for the benchmarking!
Would you prefer to fix the problems above in separate PRs?

findepi · 2020-06-11T10:41:44Z

Would you prefer to fix the problems above in separate PRs?

Would that mean we have a regression until those new PRs are done & merged?

rzeyde-varada · 2020-06-11T15:05:20Z

Would that mean we have a regression until those new PRs are done & merged?

No, it would be indeed undesirable.
I am suggesting first fixing those issues (in separate PRs) - and then test and make sure that this PR doesn't introduce performance regressions before merging it.

sopel39 · 2020-06-15T13:23:38Z

Here is another benchmark.

Benchmarks comparison-merge.pdf

I am suggesting first fixing those issues (in separate PRs) - and then test and make sure that this PR doesn't introduce performance regressions before merging it.

Good idea

rzeyde-varada · 2020-06-20T10:25:40Z

Marking this PR as draft, will unmark after fixing #4107 and #4108.

rzeyde-varada · 2021-11-20T06:37:17Z

Rebased over the latest version of #9868.

rzeyde-varada · 2022-01-17T20:29:43Z

Rebased over latest #9868.

rzeyde-varada · 2022-01-21T13:57:36Z

#9868 is merged - please take a look at this PR :)

raunaqmorarka

It looks like current functionality requires multiple flags to unlock.
We will still fall back to min/max DF collection by default when there are many adjacent distinct values. So enable-large-dynamic-filters also has to be enabled to avoid that.
In the current implementation of DFs, connectors already receive the full Domain without simplification. We'll save some memory and network communication costs with a more compact representation in Domain. But are these efficiency gains the main benefit or is this more about preventing Domain#simplify call in connectors from loosing the granularity of information in the received Domain ?
Would these changes make it worthwhile to revise the default DF thresholds in DynamicFilterConfig to higher values ?

core/trino-main/src/main/java/io/trino/operator/DynamicFilterSourceOperator.java

core/trino-main/src/test/java/io/trino/operator/TestDynamicFilterSourceOperator.java

raunaqmorarka · 2022-01-24T06:16:48Z

core/trino-spi/src/main/java/io/trino/spi/predicate/ValueSet.java

+    /**
+     * Try to return a more compact representation (if possible).
+     */
+    default ValueSet compact()


The concept of compacting ranges seems very specific to SortedRangeSet and not easily applicable to every type of ValueSet.
So it seems more suitable to put this in Ranges interface and use it through ValuesProcessor where needed.

The concept of compacting ranges seems very specific to SortedRangeSet

agreed

note however, that the interface

/** * Try to return a more compact representation (if possible). */ ValueSet compact();

is not range-specific and could live in ValueSet.
Current name is getCompactedRanges, but it still is ValueSet-breaking operation, which could be potentially extended to other ValueSet types.

I think the important question to ask is what "compact" actually means?

occupies less memory? (eg SortedRangeSet uses dictionary for singleton-ranges, but only if constructed explicitly as such)

in the first case, it's not ranges-specific

fewer ranges?

in the second case, it's ranges specific, but then why does it return a ValueSet, losing information it's ranges?

raunaqmorarka · 2022-01-24T09:06:46Z

core/trino-spi/src/main/java/io/trino/spi/predicate/SortedRangeSet.java

+    private static boolean areConsecutive(Type type, Object low, Object high)
+    {
+        return type
+                .getDiscreteValues(new Type.Range(low, high))


Can we use comparison operator to detect adjacent values the way tryMergeWithNext does ?
That would avoid relying on getDiscreteValues which isn't implemented for many types.

Not sure - IIUC, tryMergeWithNext() relies on the ranges sharing the low/high bound values, i.e. [1,5] and (5,9] can be merged into [1,9].
However, if we try to merge [1,5] and [6,9], we should make sure that 5 and 6 are "consecutive" (i.e. there is no other value between them, and this requires calling Type#getDiscreteValues).

Yes, we can't use similar logic as tryMergeWithNext here.
But re-using Type#getDiscreteValues also seems awkward.
I'm thinking we should just add a boolean areConsecutive(Object low, Object high) to Type for this.
@findepi @martint thoughts ?

While working on Iceberg I had a need to find an adjacent value (not check whether two values are adjacent).
Here is my API proposal for that: #12797

Here, it seems we don't need new methods on Type, so I'd hold off and not add them.

rzeyde-varada · 2022-02-04T09:17:50Z

Sorry for the delayed response - fixed the issues, and updated the PR.

But are these efficiency gains the main benefit or is this more about preventing Domain#simplify call in connectors from loosing the granularity of information in the received Domain ?

We would prefer to not to call Domain#simplify to allow applying the DF efficiently.
This way, our connector will get an "equivalent" predicate, which can be evaluated much faster (compared to a large discrete values' list).

core/trino-main/src/test/java/io/trino/operator/TestDynamicFilterSourceOperator.java

core/trino-spi/src/main/java/io/trino/spi/predicate/SortedRangeSet.java

raunaqmorarka · 2022-02-07T18:48:42Z

core/trino-spi/src/main/java/io/trino/spi/predicate/SortedRangeSet.java

+    private static boolean areConsecutive(Type type, Object low, Object high)
+    {
+        return type
+                .getDiscreteValues(new Type.Range(low, high))


Yes, we can't use similar logic as tryMergeWithNext here.
But re-using Type#getDiscreteValues also seems awkward.
I'm thinking we should just add a boolean areConsecutive(Object low, Object high) to Type for this.
@findepi @martint thoughts ?

findepi · 2022-02-08T11:26:32Z

core/trino-spi/src/main/java/io/trino/spi/predicate/Ranges.java

+     * Try to return a more compact representation (if possible).
+     */
+    ValueSet getCompactedRanges();


Document whether the compacted version is supported to

be equal (as in Object.equals)

contain same information, or can eg be slightly lossy (widening) for more efficient compaction

what does is returned when compaction is not possible?

Also, higher level

why is this a method on Ranges instead of on ValueSet?

why do we want to have such method at all? I would hope the ValueSet instance is "as compact as possible", always

Document whether the compacted version is supported to

Sounds good - done.

why is this a method on Ranges instead of on ValueSet?

Following #3316 (comment) comment.

why do we want to have such method at all? I would hope the ValueSet instance is "as compact as possible", always

I agree, but such compactions may cause CBO to change the plan (since we estimate the statistics differently depending on the predicate being a range or a discrete set of values), so I preferred to keep the existing code behavior - and add predicate compaction only to DF-related code.

I agree, but such compactions may cause CBO to change the plan (since we estimate the statistics differently depending on the predicate being a range or a discrete set of values)

Is this something that could be improved on?

cc @sopel39

Is this something that could be improved on?

IMO it should be, I don't think there should be code duality (DF vs CBO)

Enable it for integral types, to be used in DF.

colebow · 2022-10-19T20:18:52Z

👋 @rzeyde-varada - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

cla-bot bot added the cla-signed label Apr 2, 2020

martint reviewed Apr 2, 2020

View reviewed changes

presto-spi/src/main/java/io/prestosql/spi/predicate/Marker.java Outdated Show resolved Hide resolved

presto-spi/src/main/java/io/prestosql/spi/predicate/SortedRangeSet.java Outdated Show resolved Hide resolved

rzeyde-varada force-pushed the merge-longs branch 4 times, most recently from f5210e6 to 1f57c13 Compare April 3, 2020 20:04

martint approved these changes Apr 3, 2020

View reviewed changes

rzeyde-varada force-pushed the merge-longs branch from 2dff726 to 7641768 Compare May 3, 2020 10:14

martint self-requested a review May 15, 2020 06:25

This was referenced Jun 20, 2020

Improve statistics estimates for range predicates over int/long types #4107

Open

Use bloom filters for int/long types range predicates #4108

Closed

rzeyde-varada marked this pull request as draft June 20, 2020 10:25

sopel39 mentioned this pull request Nov 24, 2020

Simplify discrete domains for NOT IN #6076

Open

findepi force-pushed the master branch from 8538e49 to 1f896ea Compare July 30, 2021 22:14

rzeyde-varada force-pushed the merge-longs branch 2 times, most recently from 9e42c28 to 63fe21b Compare November 20, 2021 06:15

rzeyde-varada force-pushed the merge-longs branch 4 times, most recently from 8d533f0 to 6370392 Compare November 21, 2021 15:56

rzeyde-varada force-pushed the merge-longs branch from 6370392 to 0a6bcb1 Compare November 28, 2021 09:49

rzeyde-varada force-pushed the merge-longs branch from 0a6bcb1 to 007326a Compare December 18, 2021 16:38

rzeyde-varada force-pushed the merge-longs branch from 007326a to cf445bf Compare December 25, 2021 19:47

rzeyde-varada force-pushed the merge-longs branch from cf445bf to 63e9750 Compare January 17, 2022 20:29

github-actions bot added the tests:hive label Jan 17, 2022

rzeyde-varada force-pushed the merge-longs branch from 63e9750 to 04bcad8 Compare January 21, 2022 13:56

github-actions bot removed the tests:hive label Jan 21, 2022

rzeyde-varada force-pushed the merge-longs branch from 04bcad8 to 70838f9 Compare January 21, 2022 17:48

raunaqmorarka reviewed Jan 24, 2022

View reviewed changes

rzeyde-varada force-pushed the merge-longs branch from 70838f9 to 6814b56 Compare February 4, 2022 09:14

rzeyde-varada force-pushed the merge-longs branch 2 times, most recently from 5a77f95 to 5ff4c69 Compare February 4, 2022 13:13

raunaqmorarka reviewed Feb 7, 2022

View reviewed changes

findepi requested changes Feb 8, 2022

View reviewed changes

Allow discrete value set compaction

23d6a6e

Enable it for integral types, to be used in DF.

rzeyde-varada force-pushed the merge-longs branch from 5ff4c69 to 23d6a6e Compare February 11, 2022 08:00

findepi mentioned this pull request Jun 10, 2022

Add value traversal methods to Type #12797

Merged

colebow closed this Nov 11, 2022

Allow merging consecutive 'long' values in SortedRangeSet #3316

Allow merging consecutive 'long' values in SortedRangeSet #3316

Conversation

rzeyde-varada commented Apr 2, 2020 • edited Loading

rzeyde-varada commented Apr 4, 2020 • edited Loading

rzeyde-varada commented Apr 4, 2020 • edited Loading

martint commented Apr 5, 2020

martint commented Apr 5, 2020

findepi commented Apr 5, 2020

sopel39 commented Apr 6, 2020

rzeyde-varada commented Apr 8, 2020 • edited Loading

rzeyde-varada commented May 3, 2020

sopel39 commented May 31, 2020

martint commented May 31, 2020

sopel39 commented Jun 10, 2020 • edited Loading

rzeyde-varada commented Jun 11, 2020

findepi commented Jun 11, 2020

rzeyde-varada commented Jun 11, 2020 • edited Loading

sopel39 commented Jun 15, 2020

rzeyde-varada commented Jun 20, 2020

rzeyde-varada commented Nov 20, 2021

rzeyde-varada commented Jan 17, 2022

rzeyde-varada commented Jan 21, 2022

raunaqmorarka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rzeyde-varada commented Feb 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colebow commented Oct 19, 2022

rzeyde-varada commented Apr 2, 2020 •

edited

Loading

rzeyde-varada commented Apr 4, 2020 •

edited

Loading

rzeyde-varada commented Apr 4, 2020 •

edited

Loading

rzeyde-varada commented Apr 8, 2020 •

edited

Loading

sopel39 commented Jun 10, 2020 •

edited

Loading

rzeyde-varada commented Jun 11, 2020 •

edited

Loading