Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine documentation to Array::is_null #4838

Merged
merged 7 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 47 additions & 12 deletions arrow-array/src/array/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -184,41 +184,76 @@ pub trait Array: std::fmt::Debug + Send + Sync {
///
/// In most cases this will be the same as [`Array::nulls`], except for:
///
/// * DictionaryArray where [`DictionaryArray::values`] contains nulls
/// * RunArray where [`RunArray::values`] contains nulls
/// * NullArray where all indices are nulls
/// * ['DictionaryArray`] where [`DictionaryArray::values`] contains nulls
/// * [`RunArray`] where [`RunArray::values`] contains nulls
/// * [`NullArray`] where all indices are nulls
///
/// In these cases a logical [`NullBuffer`] will be computed, encoding the logical nullability
/// of these arrays, beyond what is encoded in [`Array::nulls`]
fn logical_nulls(&self) -> Option<NullBuffer> {
self.nulls().cloned()
}

/// Returns whether the element at `index` is null.
/// When using this function on a slice, the index is relative to the slice.
/// Returns whether the element at `index` is null, according to [`Array::nulls`]
///
/// Note: this method returns the physical nullability, i.e. that encoded in [`Array::nulls`]
/// see [`Array::logical_nulls`] for logical nullability
/// # Notes
/// 1. This method returns false for [`NullArray`] as explained below. See
/// [`Self::is_logical_null`] for an implementation that returns the logical
/// null value.
///
/// 2. When using this function on a slice, the index is relative to the slice.
///
/// 3. This method returns the value in the **physical** validity bitmap for an element,
/// as returned by [`Array::nulls`]. If there is no validity bitmap, returns `true`.
/// See [`Array::logical_nulls`] for logical nullability
///
/// # Example:
///
/// ```
/// use arrow_array::{Array, Int32Array};
/// use arrow_array::{Array, Int32Array, NullArray};
///
/// let array = Int32Array::from(vec![Some(1), None]);
///
/// assert_eq!(array.is_null(0), false);
/// assert_eq!(array.is_null(1), true);
///
/// // NullArrays do not have a validity mask
/// let array = NullArray::new(1);
/// assert_eq!(array.is_null(0), false);
/// ```
fn is_null(&self, index: usize) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've said this before but I think from a user PoV, having is_null and is_logical_null is confusing as hell. Which NULL is is_null?! Yeah, historically this is the physical null but do most users really care about the physical repr.? I would argue that at least this method should be called is_physical_null to force users to think about what kind of null they want, instead of tricking them into using the wrong implicit default for their use case.

Copy link
Contributor

@tustvold tustvold Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that is_null should always return logical nullability? What about for RunArray where this would have O(log(n)) complexity? What about null_count? The only consistent thing I can see is to only ever return physical nullability...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that is_null should be renamed to is_physical_null (potentially w/ a soft deprecation period) to avoid that users accidentally pick the wrong method.

You make a good point regarding null_count. My argument would be: rename that one as well, to physical_null_count. Then it's clear to which semantic you're referring to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #4840 to track

self.nulls().map(|n| n.is_null(index)).unwrap_or_default()
}

/// Returns whether the element at `index` is not null.
/// When using this function on a slice, the index is relative to the slice.
/// Returns whether the element at `index` contains a logical null
/// according to [`Array::logical_nulls`].
///
/// Note: this method returns the physical nullability, i.e. that encoded in [`Array::nulls`]
/// see [`Array::logical_nulls`] for logical nullability
/// See [`Self::is_null`] for an implementation for an implementation
/// that returns physical nullability and details on the differences between
/// logical and physical nullability.
///
/// # Example:
///
/// ```
/// use arrow_array::{Array, Int32Array, NullArray};
///
/// let array = Int32Array::from(vec![Some(1), None]);
///
/// assert_eq!(array.is_logical_null(0), false);
/// assert_eq!(array.is_logical_null(1), true);
///
/// // NullArrays are always logically null
/// let array = NullArray::new(1);
/// assert_eq!(array.is_logical_null(0), true);
/// ```
fn is_logical_null(&self, index: usize) -> bool {
Copy link
Contributor

@tustvold tustvold Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the very least we should provide an efficient implementation of this, instead of computing logical_nulls which could be very expensive.

In general I am really not a fan of adding this method, it is a fairly major potential performance footgun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

self.logical_nulls()
.map(|n| n.is_null(index))
.unwrap_or_default()
}

/// Returns whether the element at `index` is *not* null, the
/// opposite of [`Self::is_null`].
///
/// # Example:
///
Expand Down
4 changes: 4 additions & 0 deletions arrow-array/src/array/null_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ impl Array for NullArray {
(self.len != 0).then(|| NullBuffer::new_null(self.len))
}

fn is_logical_null(&self, _index: usize) -> bool {
true
}

fn is_nullable(&self) -> bool {
!self.is_empty()
}
Expand Down
Loading