-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST/DOC]: Join key matching semantics for expression keys #17184
Comments
I completely missed this one.
This should definitely raise. We should check if the join key expression is elementwise during conversion from DSL to IR.
I think we should warn if a user asks for explict coalescing and passes partial non columns expressions. As this requires a more complicated coalescing logic that isn't supported at the moment.
I am also surprise by that one. Seems like a bug. 🤔 Would have to investigate. |
Thanks: opened #17517 for the first two points, didn't manage to dig at all for the last one. |
After the merge of #17061, expression-based join keys do not appear in the output of the join. However, there are still a few open issues (especially around matching with
pl.lit
join keys, e.g. #9603).A few questions about what the desired behaviour should be in some corner cases (note: these arise because in the cudf-polars work I had written slightly different broadcasting semantics for these edge cases compared to polars, so I'm trying to figure out what is "right"):
Multiple join keys provided, but they don't all produce columns of the same length
Since
pl.col("a").slice(0, 2)
andpl.col("b")
produce columns of different lengths when evaluated, I was expecting aComputeError
here.Join key coalescing
#17601 turns off join key coalescing if any key is not a column reference. But this seems a bit over-eager, if we join on multiple keys, only some of which are expressions, then I might expect that column references are still coalesced.
I might expect that
a
is coalesced.Difference in behaviour when literals are keys
I was expecting these two to produce the same result because the first is (to my mind) equivalent to:
left.filter(pl.col("b") == pl.lit(5).cast(int))
and the second isleft.filter(pl.all_horizontal(pl.col("a") == pl.col("a"), pl.col("b") == pl.lit(5).cast(int))
.The text was updated successfully, but these errors were encountered: