-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet predicate pushdown doesn't seem to be working with timestamps #343
Comments
@kujon thanks for opening the ticket! |
Notes on comparing encoders of Timestamps using Frameless and vanilla Spark:
whereas
One difference is that they generate schema with different nullable flags.
... Might be a Looking at the Logical Plan: Vanilla has:
Frameless has:
If the literal is encoded as FramelessLit, then Parquet will have no idea on how to compare that and hence the predicate is never pushed down. Verifying that this is a FramelessLit issue by comparing two columns that don't involve literals. --> Does't make sense since predicate push down doesn't show when two columns are compared. Same for both Frameless and Vanilla. Checked that the predicate push down works for Long and other constants. This seem to be an issue only with SQLTimestamp for now. Conclusion (as of 01/21/2019): The parquet predicate push down will only work for specific literals. Better if we stick to a java.sql.Timestamp representation for now (?) |
I tried to replace the SQLTimestamp with an encoder that operates on As long as in this code here: frameless/dataset/src/main/scala/frameless/functions/package.scala Lines 27 to 32 in 0d52c03
FramelessLit path and instead generate a regular Spark Literal, then the predicate works. The new encoder didn't work for serde at this point, so we can't really replace what we have with this temp workaround, but at least it increased my confidence that the predicate push down fails when it encounters a FramelessLit, so I will be focusing more on understanding the why.
|
The reason for this not working with FramelessLit is that spark cannot recognise it as a Literal so it won't get past this code: which only pushes down Literals Possibly the only way to remediate this is to have a plan substitute the above code, I may attempt this, but it probably requires a full blown spark.sql.extensions plugin so not exactly transparent (e.g. it and all of frameless with dependencies has to be on the spark cluster classpath). |
Yea, we could add an optimizer rule, it should not be hard. |
Indeed it was not, as it's so simple it even works with session.sqlContext.experimental.extraOptimizations |
…and experimental rules
…around expression
…oper foldable test
…06 and SPARK-40380
fyi / NB for searchers - 3.2.4 cannot work with push down predicates for anything that uses InvokeLike (as SQLTimestamp, structs etc. does) Jira's SPARK-40380 and SPARK-39106 aren't present in them. Frameless built against spark versions greater than 3.3.2 will push down the predicates. |
* #343 - unpack to Literals * #343 - add struct test showing difference between extension and experimental rules * #343 - toString test to stop the patch complaint * #343 - sample docs * #343 - package rename and adding logging that the extension is injected * Apply suggestions from code review Co-authored-by: Cédric Chantepie <cchantep@users.noreply.github.com> * Refactor LitRule and LitRules tests by making them slightly more generic, adjust docs, add negative tests * #343 - disable the rule, foldable and eval evals * #343 - cleaned up * #343 - true with link for 3.2 support * #343 - bring back code gen with lazy to stop recompiles * #343 - more compat and a foldable only backport of SPARK-39106 and SPARK-40380 * #343 - option 3 - let 3.2 fail as per oss impl, seperate tests --------- Co-authored-by: Cédric Chantepie <cchantep@users.noreply.github.com> Co-authored-by: Grigory Pomadchin <grigory.pomadchin@disneystreaming.com>
Vanilla Spark:
Both,
IsNotNull
andGreaterThanOrEqual
get pushed down nicely.Frameless:
IsNotNull
gets pushed down, whileGreaterThanOrEqual
doesn't.The text was updated successfully, but these errors were encountered: