-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPC big endian offsets are not translated #859
Comments
The reference here suggests it is acceptable for implementations to simply not support files with non-native byte order
I therefore think an acceptable approach would be to return an error if attempting to read a file with a non-native byte order |
I agree an error would be better than an invalid array. Later on if it is
important we can add support for little endian
…On Sat, Apr 16, 2022 at 12:28 PM Raphael Taylor-Davies < ***@***.***> wrote:
The reference here
<https://arrow.apache.org/docs/format/Columnar.html#byte-order-endianness>
suggests it is acceptable for implementations to simply not support files
with non-native byte order
At first we will return an error when trying to read a Schema with an
endianness that does not match the underlying system. The reference
implementation is focused on Little Endian and provides tests for it.
Eventually we may provide automatic conversion via byte swapping.
I therefore think an acceptable approach would be to return an error if
attempting to read a file with a non-native byte order
—
Reply to this email directly, view it on GitHub
<#859 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADXZMPL5TOCKNKGO2NKEQDVFLTCJANCNFSM5GTNAM4Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Describe the bug
../arrow-ipc-stream/integration/1.0.0-bigendian/generated_dictionary.arrow_file
contains a UTF8 Arrow array somewhere encoded in big endian.When this is read in to the arrow-rs implementation, the offsets buffer remains big endian, even though the code assumes the offsets buffer has values in native endianness (e.g. the offsets of the created arrow-rs buffer incorrect on little endian machines like x86)
To Reproduce
See test
read_dictionary_be_not_implemented
#810It fails with Length spanned by offsets in Utf8 (687865856) is larger than the values array size (41)
Expected behavior
The test should pass (likely by translating offsets from big endian to native endianness)
Additional context
Found while adding validation in #810
The text was updated successfully, but these errors were encountered: