-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ThriftMetadataWriter
for writing Parquet metadata
#6197
Conversation
…ally copies data (apache#6043) * deprecate auto copy, ask explicit reference * update comments * make cargo doc happy
* improve dispaly for interval. * update test in pretty, and fix display problem. * tmp * fix tests in arrow-cast. * fix tests in pretty. * fix style.
* update to latest thrift (as of 11 Jul 2024) from parquet-format * pass None for optional size statistics * escape HTML tags * don't need to escape brackets in arrays
… end (#…" (apache#5933) This reverts commit 22e0b44.
This reverts commit 756b1fb.
* Update pyo3 requirement from 0.21.1 to 0.22.1 Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version. - [Release notes](https://github.com/pyo3/pyo3/releases) - [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md) - [Commits](PyO3/pyo3@v0.21.1...v0.22.1) --- updated-dependencies: - dependency-name: pyo3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * refactor: remove deprecated `FromPyArrow::from_pyarrow` "GIL Refs" are being phased out. * chore: update `pyo3` in integration tests --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* update to latest thrift (as of 11 Jul 2024) from parquet-format * pass None for optional size statistics * escape HTML tags * don't need to escape brackets in arrays * add support for unencoded_byte_array_data_bytes * add comments * change sig of ColumnMetrics::update_variable_length_bytes() * rename ParquetOffsetIndex to OffsetSizeIndex * rename some functions * suggestion from review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * add Default trait to ColumnMetrics as suggested in review * rename OffsetSizeIndex to OffsetIndexMetaData --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version. - [Release notes](https://github.com/pyo3/pyo3/releases) - [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md) - [Commits](PyO3/pyo3@v0.21.1...v0.22.2) --- updated-dependencies: - dependency-name: pyo3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…MetaData` (apache#6095) * deprecate read_page_locations * add to_thrift() to OffsetIndexMetaData
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
* Update FlightSql.proto to version 17.0 Adds new message CommandStatementIngest and removes `experimental` from other messages. * Regenerate flight sql protocol This upgrades the file to version 17.0 of the protobuf definition.
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
Add test for metadata equivalence
separate tests that require arrow into a separate module
Fix checks and merge with master
Thanks @adriangb. I'll do a review later this afternoon, but I think this is looking pretty much ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThriftMetadataWriter
for writing Parquet metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb and @etseidl
I took the liberty of merging up from main to resolve a conflict
I found a few things I think we can/should improve, most notably I don't think ThriftMetadataWriter
is marked as pub
and thus can't be used yet.
Since this PR has been hanging out for so long, however, what I think we should do is merge it in to master and then iterate there. I plan to do so once CI passes
Among other things I would like to move ThriftMetadataWriter
to its own module and add some documentation examples. I will also update #6184 to mention this code.
Thanks again 🙏
let's keep hacking on main. Next PR with some more docs, etc coming soon |
Amazing, thank you both for pushing this forward! |
I see after some more study that ThriftMetadataWriter is likely too low level to expose publically. I will focus on |
I made #6202 with some small tweaks and improved documentation in case anyone has a chance to look at it |
BTW check out this code in action: #6081 🚀 |
Continue #6000
Related to