Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-62: Clarify null bitmap interpretation, indicate bit-endianness, add null count, remove non-nullable physical distinction #34

Closed
wants to merge 6 commits into from
15 changes: 8 additions & 7 deletions format/Layout.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,9 @@ maximum of 2^31 - 1 elements. We choose a signed int32 for a couple reasons:

Any relative type can be nullable or non-nullable.

Nullable arrays have a contiguous memory buffer, known as the validity (or
null) bitmap, whose length is large enough to have 1 bit for each array slot.
Nullable arrays have a contiguous memory buffer, known as the null (or
validity) bitmap, whose length is large enough to have 1 bit for each array
slot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose that the null bitmap is always an multiple of 8 bytes in length. This simplifies some code to avoid having to manage partial word conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. There's also the SIMD question — if these buffers are word-aligned then there won't be concerns (? someone with more expertise should opine) with aligned allocations


Whether any array slot is valid (non-null) is encoded in the respective bits of
this bitmap. A 1 (set bit) for index `j` indicates that the value is not null,
Expand All @@ -113,9 +114,9 @@ j mod 8 7 6 5 4 3 2 1 0
0 0 1 0 1 0 1 1
```

Physically, non-nullable (NN) arrays do not have a bitmap.
Physically, non-nullable (NN) arrays do not have a null bitmap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought our thinking was to avoid the concept of non-nullable arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. I'm going to make another patch to strike nullability from the memory layout, but didn't want to pack too many changes into this patch https://issues.apache.org/jira/browse/ARROW-76


For nested types, if the top-level nested type is nullable, it has its own
For nested types, if the top-level nested type is nullable, it has its own null
bitmap regardless of whether the child types have any nulls.

## Primitive value arrays
Expand All @@ -128,8 +129,8 @@ Internally, the array contains a contiguous memory buffer whose total size is
equal to the slot width multiplied by the array length. For bit-packed types,
the size is rounded up to the nearest byte.

The associated validity bitmap (for nullable types) is contiguously allocated
(as described above) but does not need to be adjacent in memory to the values
The associated null bitmap (for nullable types) is contiguously allocated (as
described above) but does not need to be adjacent in memory to the values
buffer.

(diagram not to scale)
Expand Down Expand Up @@ -210,7 +211,7 @@ type. Here is a diagram showing the full physical layout of this struct:

While a struct does not have physical storage for each of its semantic slots
(i.e. each scalar C-like struct), an entire struct slot can be set to null via
the bitmap. Whether each of the child field arrays can have null values
the null bitmap. Whether each of the child field arrays can have null values
depends on whether or not the respective relative type is nullable.

## Dense union type
Expand Down