-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar #30625
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #132290 has finished for PR 30625 at commit
|
if (pattern == null) { | ||
// If pattern is null, return null value directly, since "col like null" == null. | ||
Literal(null, BooleanType) | ||
} else { | ||
val escapeStr = String.valueOf(escapeChar) | ||
pattern.toString match { | ||
case _ if escapeChar == '_' || escapeChar == '%' => l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests in LikeSimplificationSuite
, too? Btw, does this simplification work correctly if escapeChar
is not a default one other than _
and %
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I think this rule works correctly only if pattern
is valid and escapeChar
does not escape itself. For simplicity, maybe it's better to skip it if pattern
constains any escapeChar
. I will fix this and change the title/description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... I see...
|
||
test("SPARK-33677: LikeSimplification should be skipped if escape is a wildcard character") { | ||
val e = intercept[AnalysisException] { | ||
sql("SELECT string(rand()) LIKE 'm%aca' ESCAPE '%'").collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the problem is, we should only apply like simplification if the pattern is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule may not work even for the case that has a valid pattern: SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t
Kubernetes integration test starting |
pattern.toString match { | ||
case startsWith(prefix) if !prefix.endsWith(escapeStr) => | ||
case p if p.contains(escapeChar) => l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct me if I was wrong: we need to make sure this rule doesn't change result. If the pattern is invalid, Like
should fail. However, it's expensive to validate the pattern (need to compile it). Here we use p.contains(escapeChar)
as a shortcut to know if the pattern might be invalid.
Can we add some comments to explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Kubernetes integration test status failure |
Test build #132333 has finished for PR 30625 at commit
|
Kubernetes integration test starting |
|
||
test("SPARK-33677: LikeSimplification should be skipped if pattern contains any escapeChar") { | ||
withTable("string_tbl") { | ||
sql("CREATE TABLE string_tbl USING parquet SELECT 'm@ca' AS s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you use a temporary table view for better test performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no temporary table in spark, you mean temporary view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately Like
will be constant folded and not passed to LikeSimplification
if using temp view here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yea, I meant a temporary view...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luluorta have you tried temp view? AFAIK ConvertToLocalRelation
is disabled in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works if I use DataFrame to create temp view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
Kubernetes integration test status success |
Test build #132366 has finished for PR 30625 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #132401 has finished for PR 30625 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #132413 has finished for PR 30625 at commit
|
…ny escapeChar ### What changes were proposed in this pull request? `LikeSimplification` rule does not work correctly for many cases that have patterns containing escape characters, for example: `SELECT s LIKE 'm%aca' ESCAPE '%' FROM t` `SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t` For simpilicy, this PR makes this rule just be skipped if `pattern` contains any `escapeChar`. ### Why are the changes needed? Result corrupt. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added Unit test. Closes #30625 from luluorta/SPARK-33677. Authored-by: luluorta <luluorta@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 99613cd) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
…ny escapeChar ### What changes were proposed in this pull request? `LikeSimplification` rule does not work correctly for many cases that have patterns containing escape characters, for example: `SELECT s LIKE 'm%aca' ESCAPE '%' FROM t` `SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t` For simpilicy, this PR makes this rule just be skipped if `pattern` contains any `escapeChar`. ### Why are the changes needed? Result corrupt. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added Unit test. Closes #30625 from luluorta/SPARK-33677. Authored-by: luluorta <luluorta@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 99613cd) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Thanks! Merged to master/branch-3.1/branch-3.0. |
What changes were proposed in this pull request?
LikeSimplification
rule does not work correctly for many cases that have patterns containing escape characters, for example:SELECT s LIKE 'm%aca' ESCAPE '%' FROM t
SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t
For simpilicy, this PR makes this rule just be skipped if
pattern
contains anyescapeChar
.Why are the changes needed?
Result corrupt.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added Unit test.