-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct nullable string validity map treatment #517
Conversation
This pull request has been linked to Shortcut Story #25027: R unable to write arrow field with nullable strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to draft pending discussion.
b5bfc23
to
9373b19
Compare
42094c9
to
00df835
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion, let's
- remove use of the copy-then-delete in the conversion script
- add copying of metadata from the old to new array
Otherwise LGTM. Let's get an ACK from @Shelnutt2 as well.
3fa3de3
to
b70ced5
Compare
Work on the TileDB SOMA package and its independent implementation of the read-path (relying on Arrow buffers) revealed that we encoded missingness incorrectly for nullable strings. (This happened for read and write so round-turn tests passed. This is also not a concern for numeric values were encoded correctly but only concerns strings.)
This PR corrects this. It also tightens one test on deletion by switching to a triple || condition.
Example from https://app.shortcut.com/tiledb-inc/story/25027/r-unable-to-write-arrow-field-with-nullable-strings below the fold.
generates
and Python is happy too: