Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push down temporal filters for TIMESTAMP WITH TIME ZONE columns on Delta Lake #18664

Closed
findinpath opened this issue Aug 14, 2023 · 0 comments · Fixed by #19874
Closed

Push down temporal filters for TIMESTAMP WITH TIME ZONE columns on Delta Lake #18664

findinpath opened this issue Aug 14, 2023 · 0 comments · Fixed by #19874
Assignees
Labels
delta-lake Delta Lake connector enhancement New feature or request performance

Comments

@findinpath
Copy link
Contributor

findinpath commented Aug 14, 2023

Add the ability to subsume temporal filter expressions which can be subsumed to be performed on the parquet row group level over even on the metadata level instead of being applied for each row retrieved.

CAST("event_date" AS DATE) < DATE '2023-03-03')

is equivalent to

event_date < TIMESTAMP 2023-03-03 00:00:00.000000 UTC`

(still to be verified - delta.parquet.time-zone setting may play out a role in the translation of the expression above)

In a similar fashion as it is done for Iceberg with Iceberg on

assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE d >= DATE '2015-05-15'"))
.isFullyPushedDown();
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE CAST(d AS date) >= DATE '2015-05-15'"))
.isFullyPushedDown();
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE d >= TIMESTAMP '2015-05-15 12:00:00 UTC'"))
.isFullyPushedDown();
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE d >= TIMESTAMP '2015-05-15 12:00:00.000001 UTC'"))
.isNotFullyPushedDown(FilterNode.class);
// date()
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE date(d) = DATE '2015-05-15'"))
.isFullyPushedDown();
// year()
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE year(d) = 2015"))
.isFullyPushedDown();
// date_trunc
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE date_trunc('hour', d) = TIMESTAMP '2015-05-15 12:00:00.000000 UTC'"))
.isFullyPushedDown();
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE date_trunc('day', d) = TIMESTAMP '2015-05-15 00:00:00.000000 UTC'"))
.isFullyPushedDown();
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE date_trunc('month', d) = TIMESTAMP '2015-05-01 00:00:00.000000 UTC'"))
.isFullyPushedDown();
assertThat(query("SELECT * FROM test_hour_transform_timestamptz WHERE date_trunc('year', d) = TIMESTAMP '2015-01-01 00:00:00.000000 UTC'"))
.isFullyPushedDown();

add similar logic (either on the engine or on Delta Lake connector DeltaLakeMetadata#applyFilter() method) for being able to subsume such filters.

See how it is done on the Iceberg connector:

private static Optional<Domain> unwrapTimestampTzToDateCast(IcebergColumnHandle column, FunctionName functionName, long date)
{
Type type = column.getType();
checkArgument(type.equals(TIMESTAMP_TZ_MICROS), "Column of unexpected type %s: %s", type, column);
// Verify no overflow. Date values must be in integer range.
verify(date <= Integer.MAX_VALUE, "Date value out of range: %s", date);
// In Iceberg, timestamp with time zone values are all in UTC
LongTimestampWithTimeZone startOfDate = LongTimestampWithTimeZone.fromEpochMillisAndFraction(date * MILLISECONDS_PER_DAY, 0, UTC_KEY);
LongTimestampWithTimeZone startOfNextDate = LongTimestampWithTimeZone.fromEpochMillisAndFraction((date + 1) * MILLISECONDS_PER_DAY, 0, UTC_KEY);
return createDomain(functionName, type, startOfDate, startOfNextDate);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
delta-lake Delta Lake connector enhancement New feature or request performance
Development

Successfully merging a pull request may close this issue.

3 participants