-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Footer decoding #598
Conversation
bkietz
commented
Aug 27, 2024
- Adds ArrowIpcDecoderPeekFooter(), ArrowIpcDecoderVerifyFooter(), and ArrowIpcDecoderDecodeFooter()
- Uses these to read IPC files in the integration test executable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a few questions / nitpicks this looks good to me
NANOARROW_RETURN_NOT_OK(ArrowIpcDecoderVerifyFooter( | ||
decoder.get(), {{bytes.data()}, static_cast<int64_t>(bytes.size())}, error)); | ||
NANOARROW_RETURN_NOT_OK(ArrowIpcDecoderDecodeFooter( | ||
decoder.get(), {{bytes.data()}, static_cast<int64_t>(bytes.size())}, error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should verify and decode be the same function rather than separate functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following the pattern of schema decoding, where the two are separated. Verification checks offsets to ensure all flatbuffers are in bounds, while decode actually copies flatbuffers into nanoarrow data structures. I'd be fine consolidating them but that should be a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely possible to consolidate them...I think I separated them initially in case somebody owned their whole pipeline and wanted to skip it as an optimization (which is probably not actually any faster, and probably nobody will want do do). For some flatbuffer uses this matters ( http://flatgeobuf.org/#why-am-i-not-getting-expected-performance-in-gdal ).
NANOARROW_RETURN_NOT_OK( | ||
ArrowIpcDecoderSetSchema(decoder.get(), &decoder->footer->schema, error)); | ||
NANOARROW_RETURN_NOT_OK_WITH_ERROR( | ||
ArrowIpcDecoderSetEndianness(decoder.get(), decoder->endianness), error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't endianness a flag in the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endianness is a flag in the IPC schema message, but not in ArrowSchema which is what decoder->footer->schema
is, so we need to set it up separately
data.data.as_uint8 + data.size_bytes - footer_and_size_and_magic_size; | ||
|
||
// Run flatbuffers verification | ||
if (ns(Footer_verify_as_root(footer_data, decoder->header_size_bytes) != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's ns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flatcc generates symbols like org_apache_arrow_flatbuf_Footer_verify_as_root
. ns()
is a macro which prepends that namespace