-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41609: [Java] Initialize non empty offset buffer for variable-size layout before exporting #41610
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the tests first to run them in CI to verify.
The fix is ready and will be submitted later.
Marked it as a draft and will make it ready for review once I submit the fix. |
@viirya just to make sure I understand the change here. Are we introducing the required change in this PR itself right? And these test cases supposed to fail without that fix? |
@vibhatha I have some fix locally but I am unable to run the C module tests locally on a Mac. I am trying to write these tests that are supposed to fail in CI to verify that. Once the tests are verified and the fix are submitted, I will make this ready for review. |
Thanks for explaining @viirya . If you need help please let me know. |
Thank you @vibhatha |
Hmm, I think I know why the tests are not failed in Java Arrow. This is how Java Arrow imports an Utf8 array:
So even the offset buffer is not initialized, for empty array with one element offset buffer, But in arrow-rs, it takes the last value of the offset buffer as the length of data buffer, i.e., That is what I found for the first offset value from the spec:
It looks like the first value doesn't have to be 0, although generally it is. So seems Java Arrow's approach is correct without issue? |
Let me open an issue at arrow-rs and see if it makes more sense to fix it there. |
@viirya just for my curiosity, so the error you experience comes when Arrow Java vector is imported in Rust? |
Yes |
I fixed the issue at arrow-rs by apache/arrow-rs#5756. So I am closing this. Thanks @vibhatha for review. |
Rationale for this change
This is a follow up of #40038. In #40038, we fixed null offset buffer issue for BaseVariableWidthVector. For empty vector, instead of a empty offset buffer which turns to be a null buffer through C Data Interface, we export a non-empty offset which is supposed to contain zero value.
But the initialization code has a bug in the PR #40043, so the offset buffer is not initialized. Note that this is not a regression because the exported vector never works due to null offset buffer.
What changes are included in this PR?
Fixed the uninitialized offset buffer of empty
BaseVariableWidthVector
.Are these changes tested?
Added test cases.
Are there any user-facing changes?
No