-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maintain_order
in top_k
is inconsistent with maintain_order
in other places
#15238
Comments
maintain_order
in top_k
is inconsistent with maintain_order
in order placesmaintain_order
in top_k
is inconsistent with maintain_order
in other places
It looks like since #16041 the https://github.com/pola-rs/polars/blob/main/py-polars/polars/expr/expr.py#L2096-L2097 |
I seem to recall we have discussed this in the issue triage before. Not sure what conclusion we arrived at. Both |
It seems the current |
We have decided to remove the following flags:
We will not give any guarantees about the order in which the top/bottom elements are returned. In both In the future, we may include the option to maintain the original order or to return the elements in ascending/descending order. We will deprecate the parameters in |
Thanks for the update! This
slightly concerns me, because it means that #10054 isn't addressed by having a |
Regarding #10054 (comment) : I don't see an issue with this. Operations on multi-column expressions always operate on the individual columns. If you want a guarantee that entire rows are preserved, you must use In any case, we do want to support a 'stable' Currently there is a |
Agree on removing
Agree! The trouble is that That would've been fine if there was some ordering guarantee. If there isn't, then I'd like to make the case that #10054 should be reopened |
You're correct. I think in this case I think the issue should be "Add stable |
Thanks, have opened an issue Regarding
Without consulting the docs, if I read pl.col('a').top_k_by('b', k=2, descending=True) then it looks like it means "take the elements of pl.col('a').sort_by('b', descending=True).head(2) But, it's not. It does the opposite: df = pl.DataFrame({'a': [1,2,3], 'b': [5,4,6]})
print(df.select(pl.col('a').top_k_by('b', k=2, descending=True)))
print(df.select(pl.col('a').sort_by('b', descending=True).head(2)))
It actually matches In that sense, I (and others) have made the case the behaviour of "descending" is backwards. It's going to get harder to course-correct later. As hard as it may be, I suggest biting the bullet and reversing the behaviour of |
I believe the existing behavior is correct. In your |
Sorting 'a' by 'b' is the intention - I've revised the example to use
Just reading these, would you expect 1. and 2. to match, or 1. and 3.? The current answer is that 1. and 3. match. |
Ok, I see now what you mean. I think the parameter should be called EDIT: We're keeping it this way. It's a bit confusing but there is no real good way to do this. Changing True <-> False isn't going to help here. You could just as well think about taking the |
Checks
Reproducible example
Log output
Issue description
maintain_order
here refers to how ties are broken, whereas in other places in Polars it refers to whether the original input order is preservedExpected behavior
Either
or for
maintain_order
to be renamedInstalled versions
The text was updated successfully, but these errors were encountered: