Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize on Arrow RecordBatchReader in function parameters and return types #66

Closed
kylebarron opened this issue Jun 21, 2024 · 0 comments · Fixed by #69
Closed

Standardize on Arrow RecordBatchReader in function parameters and return types #66

kylebarron opened this issue Jun 21, 2024 · 0 comments · Fixed by #69

Comments

@kylebarron
Copy link
Collaborator

This should be a pretty easy change that I want to make before the next release and comes with a couple benefits.

  • Allows you to access the schema of the stream without having to "peek" at the stream manually, like we do currently
    first_batch = next(batches_iter)
  • Allows direct zero-copy interop with compiled code. E.g. @gadomski 's stac-rs arrow support would be able to directly use the result of stac_geoparquet.arrow.parse_stac_items_to_arrow without having to call back into Python to iterate over the loop. It also means you'd be able to pass the result of parse_stac_items_to_arrow directly into, say, pyogrio.write_arrow to write via GDAL. This is all thanks to the Arrow PyCapsuleInterface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant