-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33346: [Python] DataFrame Interchange Protocol for pyarrow Table #14804
Merged
jorisvandenbossche
merged 49 commits into
apache:master
from
AlenkaF:ARROW-18152-second
Jan 13, 2023
Merged
Changes from 2 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
ca526a7
Produce a __dataframe__ object - squshed commits from #14613
AlenkaF d0ca2b1
Add column convert methods
AlenkaF c356cd1
Fix linter errors
AlenkaF 8fb50c5
Add from_dataframe method details
AlenkaF 4fab43b
Add tests for from_dataframe / pandas roundtrip
AlenkaF 47ce2d6
Skip from_dataframe tests for older pandas versions
AlenkaF 1c2955f
Add support for LargeStringArrays in Column class
AlenkaF 0517a6d
Add test for uint and make changes to test_offset_of_sliced_array() a…
AlenkaF a7313fb
Prefix table metadata with pyarrow.
AlenkaF 9583672
Update from_dataframe method
AlenkaF 9e8733c
Try to add warnings to places where copies of data are being made
AlenkaF 855ec8a
Update python/pyarrow/interchange/column.py
AlenkaF beec5aa
Expose from_dataframe in interchange/__init__.py
AlenkaF b11d84e
Add lost whitespace lines
AlenkaF 6c9dce4
Revert commented categories in CategoricalDescription, column.py
AlenkaF 8d91b67
Add _dtype attribute to __inti__ of the Column class and move all the…
AlenkaF 4643f9b
Raise an error if nan_as_null=True
AlenkaF a93a46e
Linter corrections
AlenkaF 0b231ea
Add better test coverage for test_mixed_dtypes and test_dtypes
AlenkaF d8ab902
Add better test coverage for test_pandas_roundtrip and add large_memo…
AlenkaF 21af8fb
Add pyarrow roundtrip tests and make additional corrections to the co…
AlenkaF d6140d4
Correct large string handling and make smaller corrections in convert…
AlenkaF e0d1e63
Change dict arguments in protocol_df_chunk_to_pyarrow
AlenkaF 6067fb3
Update dataframe.num_chunks() method to use to_batches
AlenkaF c6eb5f3
Check for sentinel values in the datetime more efficently
AlenkaF 1a67177
Make bigger changes to how masks and arrays are constructed
AlenkaF 51dcc49
Import from pandas.api.interchange
AlenkaF 4879ef2
Add a check for use_nan, correct test using np.nan and put back check…
AlenkaF 1cbd594
Add test coverage for pandas -> pyarrow conversion
AlenkaF a6b6e54
Rename test_extra.py to test_conversion.py
AlenkaF 2e36185
Skip pandas -> pyarrow tests for older versions of pandas
AlenkaF 4ca948d
Add test coverage for sliced table in pyarrow roundtrip
AlenkaF 719ab88
Correct the handling of bitpacked booleans
AlenkaF 91ea335
Small change in slicing parametrization
AlenkaF c74eb45
Add a RuntimeError for boolean and categorical columns in from_datafr…
AlenkaF c137337
Optimize datetime handling in from_dataframe
AlenkaF 1e9cef9
Optimize buffers_to_array in from_dataframe.py
AlenkaF 0c539a0
Apply suggestions from code review - Joris
AlenkaF 6399be3
Add string column back to test_pandas_roundtrip for pandas versions 2…
AlenkaF 9f68fe7
Fix linter error
AlenkaF b926066
Remove pandas specific comment for nan_as_null in dataframe.py
AlenkaF 5c5d25e
Fix typo boolen -> categorical in categorical_column_to_dictionary
AlenkaF f2a65a6
Add a comment for float16 NotImplementedError in validity_buffer_nan_…
AlenkaF 075e888
Update validity_buffer_nan_sentinel in python/pyarrow/interchange/fro…
AlenkaF efa12d6
Make change to the offset buffers part of buffers_to_array
AlenkaF 858cadb
Linter correction
AlenkaF e937b4c
Update the handling of allow_copy keyword
AlenkaF 1b5f248
Fix failing nightly test
AlenkaF 9139444
Fix the fix for the failing test
AlenkaF File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a
assert table.equals(result)
missing here (like there is in the test above)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to pandas defining
int64
offset for what is in our case normal string, not large, the dtype that is at the end of the roundtrip becomeslarge_string
. Due to that, the assertion is done with pylist for the values and separate for dtype (first inormal string, then large string).