-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-62: Clarify null bitmap interpretation, indicate bit-endianness, add null count, remove non-nullable physical distinction #34
Changes from 1 commit
f7a3898
dac77d4
4333d82
648fd47
1f6fe03
8c92926
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,8 +89,9 @@ maximum of 2^31 - 1 elements. We choose a signed int32 for a couple reasons: | |
|
||
Any relative type can be nullable or non-nullable. | ||
|
||
Nullable arrays have a contiguous memory buffer, known as the validity (or | ||
null) bitmap, whose length is large enough to have 1 bit for each array slot. | ||
Nullable arrays have a contiguous memory buffer, known as the null (or | ||
validity) bitmap, whose length is large enough to have 1 bit for each array | ||
slot. | ||
|
||
Whether any array slot is valid (non-null) is encoded in the respective bits of | ||
this bitmap. A 1 (set bit) for index `j` indicates that the value is not null, | ||
|
@@ -113,9 +114,9 @@ j mod 8 7 6 5 4 3 2 1 0 | |
0 0 1 0 1 0 1 1 | ||
``` | ||
|
||
Physically, non-nullable (NN) arrays do not have a bitmap. | ||
Physically, non-nullable (NN) arrays do not have a null bitmap. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought our thinking was to avoid the concept of non-nullable arrays? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Totally. I'm going to make another patch to strike nullability from the memory layout, but didn't want to pack too many changes into this patch https://issues.apache.org/jira/browse/ARROW-76 |
||
|
||
For nested types, if the top-level nested type is nullable, it has its own | ||
For nested types, if the top-level nested type is nullable, it has its own null | ||
bitmap regardless of whether the child types have any nulls. | ||
|
||
## Primitive value arrays | ||
|
@@ -128,8 +129,8 @@ Internally, the array contains a contiguous memory buffer whose total size is | |
equal to the slot width multiplied by the array length. For bit-packed types, | ||
the size is rounded up to the nearest byte. | ||
|
||
The associated validity bitmap (for nullable types) is contiguously allocated | ||
(as described above) but does not need to be adjacent in memory to the values | ||
The associated null bitmap (for nullable types) is contiguously allocated (as | ||
described above) but does not need to be adjacent in memory to the values | ||
buffer. | ||
|
||
(diagram not to scale) | ||
|
@@ -210,7 +211,7 @@ type. Here is a diagram showing the full physical layout of this struct: | |
|
||
While a struct does not have physical storage for each of its semantic slots | ||
(i.e. each scalar C-like struct), an entire struct slot can be set to null via | ||
the bitmap. Whether each of the child field arrays can have null values | ||
the null bitmap. Whether each of the child field arrays can have null values | ||
depends on whether or not the respective relative type is nullable. | ||
|
||
## Dense union type | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose that the null bitmap is always an multiple of 8 bytes in length. This simplifies some code to avoid having to manage partial word conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. There's also the SIMD question — if these buffers are word-aligned then there won't be concerns (? someone with more expertise should opine) with aligned allocations