-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pl.DataFrame loads in 2D lists in unexpected way #16818
Comments
You can control the way x2 = [[1, 2], [3, 4]]
df1 = pl.DataFrame(x2, schema=['c0', 'c1'], orient="row") Note that for a square matrix like yours both orientation are valid and make sense. Although the default orientation for |
Yup, this is not a bug. Polars/Arrow are column-oriented by design, so when there is ambiguity (same number of rows/columns and the schema types don't help and you have not set the "orient" parameter), "col" will be the default. This is detailed in the
Note that, if you have the option, column data will load more efficiently; otherwise, set |
@alexander-beedie Another (relevant) confusion as a x2 = [[1, 2], [3, 4]]
pl.DataFrame(x2, schema=['c0', 'c1'])
# shape: (2, 2)
# ┌─────┬─────┐
# │ c0 ┆ c1 │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 3 │
# │ 2 ┆ 4 │
# └─────┴─────┘
pl.DataFrame(np.asarray(x2), schema=['c0', 'c1'])
# shape: (2, 2)
# ┌─────┬─────┐
# │ c0 ┆ c1 │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 2 │
# │ 3 ┆ 4 │
# └─────┴─────┘ If I were to suggest, such a silent behavior change is often dangerous (as a full-time ML engineer, I spent way more time debugging a silent behavior change than fixing a noisy spam of warnings, silent killer is a true evil...), it would be consistent across input class or well-documented at least. |
Thank you both for clarifying! I'm leaving the issue open due to @cjackal's observation about the inconsistency. Once that is resolved, whoever would like can close the issue. |
People bump their heads on this one all the time. This must be the 5th issue with this exact complaint. I think it's time to flip the switch and use row-orientation by default for sequence-of-sequences (if we cannot infer that it should be column-oriented). It just makes sense to parse these as rows - we have the dict format for column-oriented input. And we do the same for NumPy inputs. @ritchie46 What do you think? |
Checks
Reproducible example
Log output
No response
Issue description
I would expect a polars DataFrame to view each constituent list of a 2D list as a row. This does not happen by looking at the output of
df1
.Expected behavior
In the above example,
df2
anddf3
behave as I would expect.Installed versions
The text was updated successfully, but these errors were encountered: