Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): raise error when pandas df's some index name duplicates some column name #16023

Closed
wants to merge 5 commits into from
Closed

fix(python): raise error when pandas df's some index name duplicates some column name #16023

wants to merge 5 commits into from

Conversation

piri-p
Copy link

@piri-p piri-p commented May 2, 2024

Fix #15938 : raise ValueError when converting from pandas DF, whose index name duplicates some column name @MarcoGorelli

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels May 2, 2024
Copy link

codecov bot commented May 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.92%. Comparing base (864e750) to head (2a5dccf).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #16023   +/-   ##
=======================================
  Coverage   80.91%   80.92%           
=======================================
  Files        1385     1385           
  Lines      178224   178227    +3     
  Branches     3050     3051    +1     
=======================================
+ Hits       144212   144227   +15     
+ Misses      33522    33512   -10     
+ Partials      490      488    -2     
Flag Coverage Δ
python 74.42% <100.00%> (+0.02%) ⬆️
rust 78.13% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@piri-p
Copy link
Author

piri-p commented May 2, 2024

Thanks for your feedback @mcrumiller ! I have addressed your concerns (update code to handle MultiIndex, moved tests to test_interop.py, and add tests for MultiIndex)

@piri-p piri-p changed the title fix(python): raise error when pandas df's index name duplicates column name fix(python): raise error when pandas df's some index name duplicates some column name May 2, 2024
Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your pr, not sure this is general enough

@@ -1020,6 +1020,11 @@ def pandas_to_pydf(
) -> PyDataFrame:
"""Construct a PyDataFrame from a pandas DataFrame."""
convert_index = include_index and not _pandas_has_default_index(data)
if convert_index and set(data.index.names).intersection(data.columns):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polars converts non-string column names to string

I think a more generic solution is needed, to handle #16025 too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pl.from_pandas(df, include_index=True) silently ignores index if it is named the same way as a column
3 participants