-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move ParquetMetadataWriter
to its own module, update documentation
#6202
Conversation
@@ -730,281 +729,6 @@ impl<'a, W: Write + Send> PageWriter for SerializedPageWriter<'a, W> { | |||
} | |||
} | |||
|
|||
/// Writes `crate::file::metadata` structures to a thrift encdoded byte streams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was mostly moved without modification - see the second commit to see what changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice! Just a few minor suggestions/questions.
/// | ||
/// [`FileMetaData`]: crate::format::FileMetaData | ||
/// | ||
/// ```text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ascii art will get interesting when the new footer extension work begins 😅.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ready!
/// │┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ FileMetadata | ||
/// │ OffsetIndex │ contains embedded | ||
/// ││ (Optional) │◀┼ ─ │ offsets to | ||
/// │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ ColumnIndex and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do Bloom filters go? I've never used them, so genuinely don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bloom filters can theoretically be written anywhere, in practice they're either written after each row groups or just before the column index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to explain more how you would write BloomFilters in d51c697. Any corrections / comments would be appreciated
/// Note: this writer does not directly write BloomFilters. In order to write
/// BloomFilters, write the bloom filters into the buffer before creating the
/// metadata writer. Then set the corresponding `bloom_filter_offset` and
/// `bloom_filter_length` on [`ColumnChunkMetaData`] passed to this writer.
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
…o alamb/make_writer_pub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last nit. Looks ready to ship 📦
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
Thanks again @etseidl for the review |
Which issue does this PR close?
Closes #.
Rationale for this change
@AdrianB added
ParquetMetadataWriter
in #6197 but the documentation was onThriftMetadataWriter
.I also think this is a very important API and thus deserves special attention to documentation
What changes are included in this PR?
This PR sdds some documentation / examples (which also ensure
the code compiles and can be used by crate users), and makes a few tweaks:
ParquetMetadataWriter::write_page_index
(the page index is written based on its presence)ParquetMetadataWriter::finish(&mut self)
toParquetMetadataWriter::finish(mut self)
so it can be used without beingmut
note I did the changes in two commits so it is clearer what I changed.
Are there any user-facing changes?
Yes there are some API changes in an as yet unreleased API