-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-sort column order when concat vertical
and vertical_relaxed
#9891
Comments
It might be worth mentioning this requirement in the documentation in the meantime. |
This might help you from previous discussions: #7866 (comment) |
And a great source of silent bugs. If you want them sorted, why not do it yourself? columns = df1.columns
pl.concat([df.select(columns) for df in [df1, df2]]) This has been asked before and this remains my answer, as this is super solvable as the snippet above shows. |
I will clarify this in the docs and then close these issues as not planned. |
I'm happy to do this unless you have a reason not to want it. It's much more conducive to how a lot of people work to have this as an option. I like the way data.table does it where everything is opt-in and the default is very strict. Having to do 3 operations (get the columns, sort the columns for each one, concat) is just a slight bit of cognitive burden that we don't need. |
Opt-in sorting seems ok. What I am wary of, is cases where schema width differs. @magarick if you want to add opt-in sorting (or using the first df as guiding schema), can you add that logic on the rust side? |
Yeah, happy to do it. But to be very clear, the default is that all names and types have to match. In that case I think sorting is fine by default, but I'm ok being extra strict and requiring a perfect match with no flags set. |
Yes, that's what strategy @magarick hold your horses! I realize we already have this. 🙈 Strategy We could have a |
So we won't change the |
It's getting a bit messy. If the doc allows, I hope we could put a chart like the following to help users to find their desired
|
Maybe we should have a separate boolean |
I don't like arguments influencing each other. Especially if you have cases where the relaxed isn't supported. Currently that is the case with diagonal. What id we get a new strategy for which relaxed isn't supported? Having arguments as sum types (an enum) of strategies keeps the arguments orthogonal. |
Not that diagonal allows for missing columns as well. |
If I understand you correctly, maybe something like this?
|
Looking at this more, I think a larger rework could be better.
I'm happy to work on this if y'all are alright with something more in depth. |
"diagonal" does more than the schema order. It grows in a diagonal matter. New columns are added if they don't exist. So you grow horizontal and vertically -> diagonal. I think that makes sense. :) "relaxed" is mean to be the opposite of "strict". Open to a better name of that opposite. |
@ritchie46 This |
@stevenlis: "diagonal_relaxed" will be available as a first-class strategy option in the upcoming |
@alexander-beedie Thanks! I hope you could also consider adding the table above to the document because it's becoming a bit complicated, or perhaps finding another way to differentiate them. |
Problem description
As of
0.18.7
, Polars will return an error if columns' order does not align with one another when concat vertically.It would be great if polars could sort the columns itself.
The text was updated successfully, but these errors were encountered: