Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push down predicate involving timestamp_tz in Delta Lake #19874

Merged
merged 5 commits into from
Nov 27, 2023

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Nov 23, 2023

Description

Certain expressions involving TIMESTAMP WITH TIME ZONE can be optimized in Delta Lake connector based on the fact that columns of type TIMESTAMP WITH TIME ZONE are represented using the UTC time zone:

  • date_trunc('...', timeztamp_tz_column) <comparison> constant
  • CAST(timestamp_tz_column AS DATE) <comparison> constant
  • year(timestamp_tz_column) <comparison> constant

The optimization is based on similar optimization in the Iceberg connector.

Fixes #18664

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Delta Lake connector
* Pushdown filters involving columns of type TIMESTAMP WITH TIME ZONE. ({issue}`18664`)

@cla-bot cla-bot bot added the cla-signed label Nov 23, 2023
@github-actions github-actions bot added iceberg Iceberg connector delta-lake Delta Lake connector labels Nov 23, 2023
@kasiafi kasiafi force-pushed the 508OptimizeComparisonTimestampTz branch from 86b1abf to 71003a0 Compare November 23, 2023 12:01
@kasiafi kasiafi requested review from findinpath and ebyhr November 23, 2023 12:46
@kasiafi kasiafi force-pushed the 508OptimizeComparisonTimestampTz branch 2 times, most recently from 25e9cee to 869594b Compare November 24, 2023 21:21
* {@link TimestampWithTimeZoneType}. Such unwrap would not be monotonic. Within Iceberg, we know
* Test equivalent of {@code io.trino.sql.planner.iterative.rule.UnwrapCastInComparison} for {@link TimestampWithTimeZoneType}.
* {@code UnwrapCastInComparison} handles {@link DateType} and {@link TimestampType}, but cannot handle
* {@link TimestampWithTimeZoneType}. Such unwrap would not be monotonic. If we know
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* {@link TimestampWithTimeZoneType}. Such unwrap would not be monotonic. If we know
* {@link TimestampWithTimeZoneType}. Such unwrap would not be monotonic. We know

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the other occurences

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "if" was removed, it would suggest that this is always true that the time zone is UTC. It can be only guaranteed per connector, and the optimization only applies to those connectors.

long startOfDateUtcEpochMillis = someDate.atStartOfDay().toEpochSecond(UTC) * MILLISECONDS_PER_SECOND;
long startOfDateUtc = timestampTzMillisFromEpochMillis(startOfDateUtcEpochMillis);
long startOfNextDateUtc = timestampTzMillisFromEpochMillis(startOfDateUtcEpochMillis + MILLISECONDS_PER_DAY);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce code redundancy with the method testExtractTimestampTzMicrosDateComparison

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing this, I experimented with extracting common code for similar test methods. I eventually settled for more repetition, as it improves readability a lot.

.matches("VALUES " +
"(3, TIMESTAMP '2023-11-21 07:19:00.000 UTC')," +
"(4, TIMESTAMP '2005-09-10 13:00:00.000 UTC')");

Copy link
Contributor

@findinpath findinpath Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: This assertion can be added as well.

        assertThat(query("SELECT * FROM " + tableName + " WHERE date_trunc('hour', part) >= TIMESTAMP '2005-09-10 00:00:00.001 +07:00'"))
                .isFullyPushedDown();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a similar test case, thanks!

@kasiafi kasiafi force-pushed the 508OptimizeComparisonTimestampTz branch from 869594b to 3964149 Compare November 27, 2023 08:14
@kasiafi kasiafi merged commit 69759e6 into trinodb:master Nov 27, 2023
@github-actions github-actions bot added this to the 434 milestone Nov 27, 2023
@mosabua
Copy link
Member

mosabua commented Nov 27, 2023

@kasiafi and others... do we need to add a pushdown section to the performance section in the Delta Lake connector docs .. some here https://trino.io/docs/current/connector/delta-lake.html#performance

Similar to how we do it for JDBC connectors like https://trino.io/docs/current/connector/postgresql.html#pushdown

@findinpath
Copy link
Contributor

do we need to add a pushdown section to the performance section in the Delta Lake connector docs ?

I tend to say no - the improvemenst covered by this PR (which btw exist for Iceberg connector as well) are goodies that should just be there - they are "obvious" from user perspective.

While we were working on these topics on Iceberg, we've published a blog post about them https://trino.io/blog/2023/04/11/date-predicates.html

@findepi
Copy link
Member

findepi commented Nov 28, 2023

goodies that should just be there - they are "obvious" from user perspective.

💯 user docs is not the place to brag about various optimization techniques we are using.
for example, we don't document all the optimizer rules that we ship

@mosabua
Copy link
Member

mosabua commented Nov 28, 2023

Okay .. thanks @findepi and @findinpath

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Push down temporal filters for TIMESTAMP WITH TIME ZONE columns on Delta Lake
4 participants