-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(python): Add visitor pattern + builders for column sequences (#454)
Assembling columns from chunked things is rather difficult to do and is a valid thing that somebody might want to assemble from Arrow data. This PR adds a "visitor" pattern that can be extended to build "column"s, which are currently just `list()`s. Before trimming down this PR to a managable set of changes, I also implemented the visitor that concatenates data buffers for single data buffer types ( https://gist.github.com/paleolimbot/17263e38b5d97c770e44d33b11181eaf ), which will be needed for `to_columns()` to be used in any kind of serious way. To support the "visitor" pattern, I moved some of the `PyIterator`-specific pieces into the `PyIterator` so that the visitor can re-use the relevant pieces of `ArrayViewBaseIterator`. This pattern also solves one of the problems I had when attempting a "repr" iterator, which is that I was trying to build something rather than iterate over it. ```python import nanoarrow as na import pandas as pd from nanoarrow import visitor url = "https://github.com/apache/arrow-experiments/raw/main/data/arrow-commits/arrow-commits.arrows" array = na.ArrayStream.from_url(url).read_all() # to_columns() doesn't (and won't) produce anything numpy or pandas-related names, columns = visitor.to_columns(array) # ..but lets data frames be built rather compactly pd.DataFrame({k: v for k, v in zip(names, columns)}) ```
- Loading branch information
1 parent
197f117
commit 490b980
Showing
7 changed files
with
414 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.