feat: Type coercion for Dictionary(_, _) to Utf8 for regex conditions #5152
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #5154
Rationale for this change
This change teaches DataFusion how to coerce Dictionary(_, string-like-type) to a string-like type for regex conditional expressions, so the comparison may succeed.
What changes are included in this PR?
Updated the type coercion rules when encountering one of the regular expression operators.
Are these changes tested?
Yes, new tests were added to validate the changes.
Are there any user-facing changes?
Queries that previously would fail with an error that the Dictionary type is incompatible with a regular expression condition will now succeed. IOx uses
Dictionary(Int32, Utf8)
columns for tags.In some cases, regular expression queries succeed, such as:
As the optimiser changes the filter plan from a regex conditional to a
LIKE '%9%'
, and theLIKE
operator has additional code to coerce dictionary types:https://github.com/apache/arrow-datafusion/blob/031534d94efb305eb26a7c16fd7e06ae3bcd88bb/datafusion/expr/src/type_coercion/binary.rs#L522
however, other cases unexpectedly fail:
as the optimiser does not rewrite this to a
LIKE
expression1Footnotes
Incidentally, the optimiser could also rewrite this to
LIKE '%9'
↩