-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend the parameter on of the join function for more complex operations #4207
Comments
Thanks for your suggestions. These are non trivial requests. non-equi joins might be on the road map in the future. But is currently not in the road map as it does not fit on the join architectures that are implemented. |
Ok thank you for the information. Do you want that I close then the ticket for now? |
This is a duplicate of #3438, please add your vote/support there. Closing this issue. |
@ritchie46 just curious to know if you could clarify on how non-equi joins aren't supported yet there does appear to be support for https://stackoverflow.com/a/74392766/1080804 So are cross-joins an exception because they exhaustively create every combination whereas other joins don't? And.. are cross-joins the best path forward for those of us that need custom logic like this? The only other approach I could think of was filtering from dynamic list of conditions (where the list is collected from a separate |
Added non-equi joins in #18365 |
This is partially related to the other ticket I created #4206, since this would be also a solution to the problem but since I think it would be a major restructure and changes of the underlying code.
So my suggestion is to allow for complex expression in the parameter
on
in thejoin
function to give the user more flexibility and kind of sql feeling.So what to I mean with more complex expression, so for that I want to give an example how it could look like
So the example is the same as in #4206
Example
Data
So the aim is to join on
id
and thatdates
is in betweenstart
andend
Solution
So the feature request is to allow a syntax like this
(df_2.select(pl.col("dates")) >= df_1.select(pl.col("start")))
which results in a series ofTRUE
andFALSE
values in theon
parameter.The text was updated successfully, but these errors were encountered: