-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine documentation to Array::is_null
#4838
Changes from 1 commit
78b885a
65b0b16
6613635
39bf205
6ad27ab
983aca6
fcc108b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -184,41 +184,76 @@ pub trait Array: std::fmt::Debug + Send + Sync { | |
/// | ||
/// In most cases this will be the same as [`Array::nulls`], except for: | ||
/// | ||
/// * DictionaryArray where [`DictionaryArray::values`] contains nulls | ||
/// * RunArray where [`RunArray::values`] contains nulls | ||
/// * NullArray where all indices are nulls | ||
/// * ['DictionaryArray`] where [`DictionaryArray::values`] contains nulls | ||
/// * [`RunArray`] where [`RunArray::values`] contains nulls | ||
/// * [`NullArray`] where all indices are nulls | ||
/// | ||
/// In these cases a logical [`NullBuffer`] will be computed, encoding the logical nullability | ||
/// of these arrays, beyond what is encoded in [`Array::nulls`] | ||
fn logical_nulls(&self) -> Option<NullBuffer> { | ||
self.nulls().cloned() | ||
} | ||
|
||
/// Returns whether the element at `index` is null. | ||
/// When using this function on a slice, the index is relative to the slice. | ||
/// Returns whether the element at `index` is null, according to [`Array::nulls`] | ||
/// | ||
/// Note: this method returns the physical nullability, i.e. that encoded in [`Array::nulls`] | ||
/// see [`Array::logical_nulls`] for logical nullability | ||
/// # Notes | ||
/// 1. This method returns false for [`NullArray`] as explained below. See | ||
/// [`Self::is_logical_null`] for an implementation that returns the logical | ||
/// null value. | ||
/// | ||
/// 2. When using this function on a slice, the index is relative to the slice. | ||
/// | ||
/// 3. This method returns the value in the **physical** validity bitmap for an element, | ||
/// as returned by [`Array::nulls`]. If there is no validity bitmap, returns `true`. | ||
/// See [`Array::logical_nulls`] for logical nullability | ||
/// | ||
/// # Example: | ||
/// | ||
/// ``` | ||
/// use arrow_array::{Array, Int32Array}; | ||
/// use arrow_array::{Array, Int32Array, NullArray}; | ||
/// | ||
/// let array = Int32Array::from(vec![Some(1), None]); | ||
/// | ||
/// assert_eq!(array.is_null(0), false); | ||
/// assert_eq!(array.is_null(1), true); | ||
/// | ||
/// // NullArrays do not have a validity mask | ||
/// let array = NullArray::new(1); | ||
/// assert_eq!(array.is_null(0), false); | ||
/// ``` | ||
fn is_null(&self, index: usize) -> bool { | ||
self.nulls().map(|n| n.is_null(index)).unwrap_or_default() | ||
} | ||
|
||
/// Returns whether the element at `index` is not null. | ||
/// When using this function on a slice, the index is relative to the slice. | ||
/// Returns whether the element at `index` contains a logical null | ||
/// according to [`Array::logical_nulls`]. | ||
/// | ||
/// Note: this method returns the physical nullability, i.e. that encoded in [`Array::nulls`] | ||
/// see [`Array::logical_nulls`] for logical nullability | ||
/// See [`Self::is_null`] for an implementation for an implementation | ||
/// that returns physical nullability and details on the differences between | ||
/// logical and physical nullability. | ||
/// | ||
/// # Example: | ||
/// | ||
/// ``` | ||
/// use arrow_array::{Array, Int32Array, NullArray}; | ||
/// | ||
/// let array = Int32Array::from(vec![Some(1), None]); | ||
/// | ||
/// assert_eq!(array.is_logical_null(0), false); | ||
/// assert_eq!(array.is_logical_null(1), true); | ||
/// | ||
/// // NullArrays are always logically null | ||
/// let array = NullArray::new(1); | ||
/// assert_eq!(array.is_logical_null(0), true); | ||
/// ``` | ||
fn is_logical_null(&self, index: usize) -> bool { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think at the very least we should provide an efficient implementation of this, instead of computing logical_nulls which could be very expensive. In general I am really not a fan of adding this method, it is a fairly major potential performance footgun There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed |
||
self.logical_nulls() | ||
.map(|n| n.is_null(index)) | ||
.unwrap_or_default() | ||
} | ||
|
||
/// Returns whether the element at `index` is *not* null, the | ||
/// opposite of [`Self::is_null`]. | ||
/// | ||
/// # Example: | ||
/// | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've said this before but I think from a user PoV, having
is_null
andis_logical_null
is confusing as hell. Which NULL isis_null
?! Yeah, historically this is the physical null but do most users really care about the physical repr.? I would argue that at least this method should be calledis_physical_null
to force users to think about what kind of null they want, instead of tricking them into using the wrong implicit default for their use case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that is_null should always return logical nullability? What about for RunArray where this would have
O(log(n))
complexity? What about null_count? The only consistent thing I can see is to only ever return physical nullability...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that
is_null
should be renamed tois_physical_null
(potentially w/ a soft deprecation period) to avoid that users accidentally pick the wrong method.You make a good point regarding
null_count
. My argument would be: rename that one as well, tophysical_null_count
. Then it's clear to which semantic you're referring to.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #4840 to track