You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fails with polars.exceptions.ComputeError: join keys did not fulfill 1:m validation.
but this is clearly a one to many join.
Note: nulls are not joined (join_nulls = False),
so the left dataframe is unique (except for the nulls).
Use case
Real world datasets often have null values for missing data (the example above is just greatly simplified). On such datasets, no cardinality validation can be performed on a left join.
I googled a bit and tied to find similar issues but could not find one. I am also not sure if my question here makes sense (the current behaviour is as documented) but in my option the example above is clearly a one-to-many join with some missing data.
Expected behaviour
Exclude null values from the uniqueness check if join_nulls=False (null values will not produce matches)
The example above shall therefore not raise a ComputeError.
(Note: I am not sure if the current behaviour is only because of Pandas.
original issue that added join cardinality validation: #9263)
The text was updated successfully, but these errors were encountered:
Description
According to the documentation, the validate parameter of polars.DataFrame.join checks whether
join keys are unique in the left/right/both datasets.
Consequently,
fails with
polars.exceptions.ComputeError: join keys did not fulfill 1:m validation
.but this is clearly a one to many join.
Note:
nulls
are not joined (join_nulls = False),so the left dataframe is unique (except for the nulls).
Use case
Real world datasets often have null values for missing data (the example above is just greatly simplified). On such datasets, no cardinality validation can be performed on a left join.
I googled a bit and tied to find similar issues but could not find one. I am also not sure if my question here makes sense (the current behaviour is as documented) but in my option the example above is clearly a
one-to-many
join with some missing data.Expected behaviour
Exclude null values from the uniqueness check if join_nulls=False (null values will not produce matches)
The example above shall therefore not raise a ComputeError.
(Note: I am not sure if the current behaviour is only because of Pandas.
original issue that added join cardinality validation: #9263)
The text was updated successfully, but these errors were encountered: