-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of parquet dictionary to domain conversion #15268
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Quite considerable code duplication. is it worth it?
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Outdated
Show resolved
Hide resolved
|
||
long minValue = min instanceof Binary ? getShortDecimalValue(((Binary) min).getBytes()) : asLong(min); | ||
long maxValue = min instanceof Binary ? getShortDecimalValue(((Binary) max).getBytes()) : asLong(max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a bug here, right?
can you please fix it as a prep commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible for the min and max values to be of different types in the file, it's more of a consistency/readability fix
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % @findepi comments %
SortedRangeSet#of(io.trino.spi.type.Type, java.util.Collection<?>)
and SortedRangeSet#buildFromUnsortedRanges
look similar (comparisons etc). Maybe it's possible to improve range based method
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Show resolved
Hide resolved
Provides compile time guarantee that all the cases are handled
8889bdf
to
a17d00d
Compare
I looked into what is slow here The problem is that |
a17d00d
to
2b328af
Compare
core/trino-spi/src/main/java/io/trino/spi/predicate/SortedRangeSet.java
Outdated
Show resolved
Hide resolved
{ | ||
private final Type type; | ||
private final MethodHandle rangeComparisonOperator; | ||
private final List<Range> ranges; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We could probably optimize it further by using BlockBuilder
directly
Improved logic for constructing domain from dictionary values in TupleDomainParquetPredicate by providing builder in SortedRangeSet which caches comparisonOperator and re-uses it for Range construction Benchmark Mode Cnt Score Before Score After Units BenchmarkTupleDomainParquetPredicate.domainFromDictionary avgt 10 354.194 ± 29.631 246.769 ± 12.390 ms/op
2b328af
to
a29af5e
Compare
Description
Improved logic for constructing domain from dictionary values in TupleDomainParquetPredicate by using the faster code path for constructing ValueSet from singleton values rather than Ranges.
Benchmark Mode Cnt Score Before Score After Units
BenchmarkTupleDomainParquetPredicate.domainFromDictionary avgt 10 354.194 ± 29.631 211.831 ± 8.793 ms/op
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: