Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Type coercion for Dictionary(_, _) to Utf8 for regex conditions #5152

Merged
merged 1 commit into from
Feb 3, 2023

Conversation

stuartcarnie
Copy link
Contributor

@stuartcarnie stuartcarnie commented Feb 1, 2023

Which issue does this PR close?

Closes #5154

Rationale for this change

This change teaches DataFusion how to coerce Dictionary(_, string-like-type) to a string-like type for regex conditional expressions, so the comparison may succeed.

What changes are included in this PR?

Updated the type coercion rules when encountering one of the regular expression operators.

Are these changes tested?

Yes, new tests were added to validate the changes.

Are there any user-facing changes?

Queries that previously would fail with an error that the Dictionary type is incompatible with a regular expression condition will now succeed. IOx uses Dictionary(Int32, Utf8) columns for tags.

In some cases, regular expression queries succeed, such as:

SELECT usage_idle FROM cpu WHERE cpu ~ '9'

As the optimiser changes the filter plan from a regex conditional to a LIKE '%9%', and the LIKE operator has additional code to coerce dictionary types:

https://github.com/apache/arrow-datafusion/blob/031534d94efb305eb26a7c16fd7e06ae3bcd88bb/datafusion/expr/src/type_coercion/binary.rs#L522

however, other cases unexpectedly fail:

SELECT usage_idle FROM cpu WHERE cpu ~ '9$'

as the optimiser does not rewrite this to a LIKE expression1

Footnotes

  1. Incidentally, the optimiser could also rewrite this to LIKE '%9'

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Feb 1, 2023
@stuartcarnie stuartcarnie marked this pull request as ready for review February 1, 2023 23:00
@alamb alamb requested a review from liukun4515 February 2, 2023 17:47
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me -- thank you @stuartcarnie

@alamb alamb merged commit 100665c into apache:master Feb 3, 2023
@ursabot
Copy link

ursabot commented Feb 3, 2023

Benchmark runs are scheduled for baseline = e69cd55 and contender = 100665c. 100665c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@stuartcarnie stuartcarnie deleted the sgc/regex_coercion branch February 3, 2023 21:48
@andygrove andygrove added the enhancement New feature or request label Mar 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add type coercion from Dictionary to string for regular expressions
4 participants