Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow for zero-width fixed size lists #18940

Merged
merged 6 commits into from
Sep 27, 2024

Conversation

coastalwhite
Copy link
Collaborator

@coastalwhite coastalwhite commented Sep 26, 2024

This PR properly implements polars.Array(..., 0). This does make a few
fundamental changes to how FixedSizeListArray works internally. Mainly, it
now keeps a length field that actually determines the length instead of
relying on the values.len() / size.

Fixes #18921, #16878.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Sep 26, 2024
This PR properly implements `polars.Array(..., 0)`. This does make a few
fundamental changes to how `FixedSizeListArray` works internally. Mainly, it
now keeps a `length` field that actually determines the length instead of
relying the `values.len() / size`.

Fixes pola-rs#18921, pola-rs#16878.
} else {
unreachable!("`GrowableFixedSizeList` expects `DataType::FixedSizeList`")
};
let size = arrays[0].size();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: we can already get this size from the array, and the array has the invariant that the dtype must be FSL

crates/polars-arrow/src/array/static_array.rs Outdated Show resolved Hide resolved
Comment on lines -177 to +190
let offsets = list.offsets().buffer().iter();
let expected =
(list.offsets().first().to_usize()..list.len()).map(|ix| O::from_as_usize(ix * size));

match offsets
.zip(expected)
.find(|(actual, expected)| *actual != expected)
{
Some(_) => polars_bail!(ComputeError:
"not all elements have the specified width {size}"
),
None => {
let sliced_values = list.values().sliced(
list.offsets().first().to_usize(),
list.offsets().range().to_usize(),
);
cast(sliced_values.as_ref(), inner.dtype(), options)?
},
let start_offset = list.offsets().first().to_usize();
let offsets = list.offsets().buffer();

let mut is_valid = true;
for (i, offset) in offsets.iter().enumerate() {
is_valid &= offset.to_usize() == start_offset + i * size;
}

polars_ensure!(is_valid, ComputeError: "not all elements have the specified width {size}");

let sliced_values = list
.values()
.sliced(start_offset, list.offsets().range().to_usize());
cast(sliced_values.as_ref(), inner.dtype(), options)?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: I feel like this is much clearer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let is_valid = offsets.iter().enumerate().all(|(i, offset)| offset.to_usize() == start_offset + i * size);

Comment on lines +571 to +580
pub fn to_fixed_size_list(self, size: usize, is_nullable: bool) -> ArrowDataType {
ArrowDataType::FixedSizeList(
Box::new(Field::new(
PlSmallStr::from_static("item"),
self,
is_nullable,
)),
size,
)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added so that we don't constantly have to call .to_physical in reshape.

@@ -41,6 +41,7 @@ pub fn read_fixed_size_list<R: Read + Seek>(
)?;

let (field, size) = FixedSizeListArray::get_child_and_size(&dtype);
polars_ensure!(size > 0, nyi = "Cannot read zero sized arrays from IPC");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this should be implemented, so I just left it like this for now.

Comment on lines +76 to +79
if *self_width == 0 {
return Bitmap::new_with_value(false, self.len());
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] != [] = False

Comment on lines +192 to +193
// # Try to Chunked Arrays
pub fn try_bool(&self) -> Option<&BooleanChunked> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: this allows us to avoid instantiating the errors.

ArrowDataType::FixedSizeList(field, size) => {
feature_gated!("dtype-array", {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: This is already feature-gated, no real sense in feature gating it again

@@ -924,7 +923,7 @@ def test_parquet_array_dtype_nulls() -> None:
[[]],
[[None]],
[[[None], None]],
[[[None], []]],
[[[None], [None]]],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a clear error

Comment on lines -103 to -113
@pytest.mark.parametrize("shape", [(0, 1), (1, -1), (-1, 1)])
def test_reshape_empty_invalid_2d(shape: tuple[int, ...]) -> None:
s = pl.Series("a", [], dtype=pl.Int64)
with pytest.raises(
InvalidOperationError,
match=re.escape(
f"cannot reshape empty array into shape {display_shape(shape)}"
),
):
s.reshape(shape)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the cases now fail

@ritchie46 ritchie46 merged commit a7845c4 into pola-rs:main Sep 27, 2024
26 checks passed
@coastalwhite coastalwhite deleted the feat/zero-width-arrays branch September 27, 2024 06:44
@etiennebacher
Copy link
Contributor

Hi, just FYI you need to repeat the "Fixes" for each issue number to automatically close multiple issues. Currently #16878 is not closed ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero-sized Array caused panic exception
4 participants