-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++/Java] Error when reading inner lists within a struct in empty outer lists from C++/Python in Java #31396
Comments
Arrow User: The memory layout for the list items is shown as:
C++ explicitly fails on zero length offsets: https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_nested.cc#L111, which means that populating a single int32 offset buffer on lists that are never appended to before finishing the builder to an array is expected. C++ also relies on the offset length (offset length - 1) to denote the length of the array to construct during reading on various parts (e.g. ListArrayFromArrays). public void setPosition(int index) {
super.setPosition(index);
- if (vector.getOffsetBuffer().capacity() == 0) {
+ if (vector.getOffsetBuffer().capacity() <= OFFSET_WIDTH) {
currentOffset = 0;
maxOffset = 0;
} else { ... |
Alessandro Molina / @amol-: |
David Li / @lidavidm:
So for a List type, a length 0 list should have 1 32-bit offset value, but Java appears to assume either the offsets buffer will be empty or that it will have at least two values. |
David Li / @lidavidm:
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowStreamReader;
import org.apache.arrow.vector.VectorSchemaRoot;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;var allocator = new RootAllocator();
File file = new File("foo.arrows");
FileInputStream inputStream = new FileInputStream(file);
ArrowStreamReader reader = new ArrowStreamReader(inputStream, allocator);
reader.loadNextBatch()
reader.getVectorSchemaRoot().getVector("a").getReader()
var listReader = reader.getVectorSchemaRoot().getVector("a").getReader();
listReader.setPosition(0); fails with
|
When using C++ (or Python) to construct a null or empty outer array of type array_1: list<item: struct<array_sub_col: list<item: string>>>, either:
an out of bounds exceptions (see stack trace below) follows when later retrieving the field reader for the inner list (array_sub_col) in Java, when trying to access the subsequent offset buffer: https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/complex/impl/UnionListReader.java#L64
Reproduction
Java: 7.0.0
C++: 7.0.0
Python: 7.0.0
Creating a stream on C++ of type array_1: list<item: struct<array_sub_col: list<item: string>>> with an empty (or null) outer list:
As expected, Python holds the same memory layout for the field vectors as the C++ code above:
Java fails when then trying to access the inner list's field reader:
Stack trace:
Reporter: Arrow User
Note: This issue was originally created as ARROW-15971. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: