Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Footer decoding #598

Merged
merged 3 commits into from
Aug 30, 2024
Merged

feat: add Footer decoding #598

merged 3 commits into from
Aug 30, 2024

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Aug 27, 2024

  • Adds ArrowIpcDecoderPeekFooter(), ArrowIpcDecoderVerifyFooter(), and ArrowIpcDecoderDecodeFooter()
  • Uses these to read IPC files in the integration test executable

@bkietz bkietz requested a review from zeroshade August 27, 2024 19:24
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a few questions / nitpicks this looks good to me

Comment on lines +190 to +193
NANOARROW_RETURN_NOT_OK(ArrowIpcDecoderVerifyFooter(
decoder.get(), {{bytes.data()}, static_cast<int64_t>(bytes.size())}, error));
NANOARROW_RETURN_NOT_OK(ArrowIpcDecoderDecodeFooter(
decoder.get(), {{bytes.data()}, static_cast<int64_t>(bytes.size())}, error));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should verify and decode be the same function rather than separate functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the pattern of schema decoding, where the two are separated. Verification checks offsets to ensure all flatbuffers are in bounds, while decode actually copies flatbuffers into nanoarrow data structures. I'd be fine consolidating them but that should be a follow up

@paleolimbot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely possible to consolidate them...I think I separated them initially in case somebody owned their whole pipeline and wanted to skip it as an optimization (which is probably not actually any faster, and probably nobody will want do do). For some flatbuffer uses this matters ( http://flatgeobuf.org/#why-am-i-not-getting-expected-performance-in-gdal ).

NANOARROW_RETURN_NOT_OK(
ArrowIpcDecoderSetSchema(decoder.get(), &decoder->footer->schema, error));
NANOARROW_RETURN_NOT_OK_WITH_ERROR(
ArrowIpcDecoderSetEndianness(decoder.get(), decoder->endianness), error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't endianness a flag in the schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endianness is a flag in the IPC schema message, but not in ArrowSchema which is what decoder->footer->schema is, so we need to set it up separately

data.data.as_uint8 + data.size_bytes - footer_and_size_and_magic_size;

// Run flatbuffers verification
if (ns(Footer_verify_as_root(footer_data, decoder->header_size_bytes) !=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's ns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatcc generates symbols like org_apache_arrow_flatbuf_Footer_verify_as_root. ns() is a macro which prepends that namespace

@bkietz bkietz merged commit cf38896 into apache:main Aug 30, 2024
34 checks passed
@bkietz bkietz deleted the ipc-footer-reading branch September 3, 2024 21:51
@paleolimbot paleolimbot added this to the nanoarrow 0.6.0 milestone Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants