Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ValueSet expansion into a discrete set for Bloom filtering #9868

Merged
merged 3 commits into from
Jan 20, 2022

Conversation

rzeyde-varada
Copy link
Contributor

Fixes #4108.

@findepi
Copy link
Member

findepi commented Nov 4, 2021

Wonder whether user(s) of io.trino.spi.predicate.TupleDomain#extractDiscreteValues would benefit from a similar range unpacking.

@rzeyde-varada
Copy link
Contributor Author

Wonder whether user(s) of io.trino.spi.predicate.TupleDomain#extractDiscreteValues would benefit from a similar range unpacking.

IIUC, it has a single usage here:

Optional<Map<ColumnHandle, List<NullableValue>>> bindings = TupleDomain.extractDiscreteValues(effectivePredicate);

I think it should be OK to try range expansion here as well, but we need to apply a reasonable limit - maybe reuse/rename getOrcBloomFiltersExpandDiscreteValuesLimit(session)?
If it's too complex, we can do it in a separate PR...

@raunaqmorarka
Copy link
Member

Wonder whether user(s) of io.trino.spi.predicate.TupleDomain#extractDiscreteValues would benefit from a similar range unpacking.

IIUC, it has a single usage here:

Optional<Map<ColumnHandle, List<NullableValue>>> bindings = TupleDomain.extractDiscreteValues(effectivePredicate);

I think it should be OK to try range expansion here as well, but we need to apply a reasonable limit - maybe reuse/rename getOrcBloomFiltersExpandDiscreteValuesLimit(session)? If it's too complex, we can do it in a separate PR...

In the current code, we use getHiveBucketFilter before we call simplify on tuple domain. So it would not be beneficial unless there is some other way to encounter a compacted TupleDomain there

@rzeyde-varada
Copy link
Contributor Author

IIUC, it may help if we use an inequality predicate (e.g. col >= 1 AND col <= 5) over a bucketing column here:

public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(ConnectorSession session, ConnectorTableHandle tableHandle, Constraint constraint)
{
HiveTableHandle handle = (HiveTableHandle) tableHandle;
checkArgument(handle.getAnalyzePartitionValues().isEmpty() || constraint.getSummary().isAll(), "Analyze should not have a constraint");
HivePartitionResult partitionResult = partitionManager.getPartitions(metastore, new HiveIdentity(session), handle, constraint);
HiveTableHandle newHandle = partitionManager.applyPartitionResult(handle, partitionResult, constraint.getPredicateColumns());
if (handle.getPartitions().equals(newHandle.getPartitions()) &&
handle.getCompactEffectivePredicate().equals(newHandle.getCompactEffectivePredicate()) &&
handle.getBucketFilter().equals(newHandle.getBucketFilter()) &&
handle.getConstraintColumns().equals(newHandle.getConstraintColumns())) {
return Optional.empty();
}
return Optional.of(new ConstraintApplicationResult<>(newHandle, partitionResult.getUnenforcedConstraint(), false));
}

@rzeyde-varada rzeyde-varada force-pushed the expand-ranges branch 3 times, most recently from c9fd6b7 to 27a756c Compare November 16, 2021 11:33
@rzeyde-varada
Copy link
Contributor Author

Many thanks for the review!

@rzeyde-varada
Copy link
Contributor Author

I changed DEFAULT_EXPAND_DISCRETE_VALUES_LIMIT to be equal to DEFAULT_COMPACTION_THRESHOLD, following #9868 (comment).

@raunaqmorarka
Copy link
Member

I changed DEFAULT_EXPAND_DISCRETE_VALUES_LIMIT to be equal to DEFAULT_COMPACTION_THRESHOLD, following #9868 (comment).

Can we remove the new config altogether and just derive the limit from existing compaction threshold config ?

@rzeyde-varada
Copy link
Contributor Author

Can we remove the new config altogether and just derive the limit from existing compaction threshold config ?

Sounds good - this indeed simplifies 1edc70f significantly :)

@rzeyde-varada rzeyde-varada force-pushed the expand-ranges branch 2 times, most recently from b69aeb1 to 311a105 Compare November 18, 2021 12:18
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@rzeyde-varada
Copy link
Contributor Author

Thanks for the review and the approval!
Updated the PR and rebased.

@rzeyde-varada rzeyde-varada requested a review from findepi January 8, 2022 08:50
@findepi
Copy link
Member

findepi commented Jan 10, 2022

rzeyde-varada force-pushed the expand-ranges branch from 7e204a1 to 94a1a1d 3 days ago

ouch, rebase. this gives me https://github.com/trinodb/trino/compare/7e204a1545ea3b739148cd2036fb6a428c2f4c69..94a1a1d247c188dd6deea2ac61bbc31575908f73 as the diff. Has anything changed there?

@rzeyde-varada
Copy link
Contributor Author

Oops.... un-rebased, so the following diff can be reviewed more easily:
https://github.com/trinodb/trino/compare/7e204a1545ea3b739148cd2036fb6a428c2f4c69..cafcdda533be11effecf7fdda350102957711d78

* For example: [1, 5] can be expanded into {1, 2, 3, 4, 5}.
* If the data type is not supported or the expansion results in too many values, {@code Optional.empty()} is returned.
*/
static Optional<Collection<Object>> tryExpandRanges(ValueSet valuesSet, int valuesLimit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - if I remove it, the build will fail (IIUC):

[ERROR] .../ValueSet.java:[176,5] interface abstract methods cannot have body

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation takes ValueSet and is defined in ValueSet. Why not instance/interface method?

Copy link
Contributor Author

@rzeyde-varada rzeyde-varada Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - my bad... 🤦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a (hopefully) better version in 7a8425f088.

@findepi
Copy link
Member

findepi commented Jan 11, 2022

Oops.... un-rebased, so the following diff can be reviewed more easily:

thanks!

@findepi
Copy link
Member

findepi commented Jan 15, 2022

@rzeyde-varada thanks for the update. Please see CI red.

@rzeyde-varada
Copy link
Contributor Author

Thanks!
Fixed the CI :)

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 PTAL

@rzeyde-varada
Copy link
Contributor Author

Simplified the code, following #9868 (comment).

@findepi findepi merged commit 3208171 into trinodb:master Jan 20, 2022
findepi pushed a commit that referenced this pull request Jan 20, 2022
Also, implement it for basic integral types.

Following #9868 (comment)
@findepi findepi mentioned this pull request Jan 20, 2022
@github-actions github-actions bot added this to the 369 milestone Jan 20, 2022
@rzeyde-varada rzeyde-varada deleted the expand-ranges branch January 21, 2022 13:45
@rzeyde-varada
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Use bloom filters for int/long types range predicates
5 participants