Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AsArray trait for more ergonomic downcasting #3912

Merged
merged 3 commits into from
Mar 23, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Mar 23, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

We currently provide downcasting utility functions, however, they have a couple of issues:

  • Lack of method chaining, makes for more nested and potentially less legible code
  • Methods are named inconsistently and are unnecessarily verbose - e.g as_largestring_array but as_large_list_array
  • Not all methods can easily be used in generic context, e.g. no as_byte_array
  • When called on &ArrayRef result in two dynamic dispatches, as ArrayRef is itself coerced to dyn Array
  • Lack of autoderef means you often need to add & as necessary
  • Each method has to be imported separately

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Mar 23, 2023
}

/// Downcast this to a [`GenericByteArray`] returning `None` if not possible
fn as_bytes_opt<T: ByteArrayType>(&self) -> Option<&GenericByteArray<T>>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight I probably should have called this GenericBytesArray but hey ho

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Some other suggestions:

  1. Mark as_XXX_array functions deprecated (or maybe make a note to do so in a later release)
  2. Consider updating the example in the main README: https://docs.rs/arrow/35.0.0/arrow/#type-erasure--trait-objects

@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
///
/// // Values are polymorphic and so require a downcast.
/// let av = array.values();
/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
/// let ava: &BinaryArray = av.as_binary();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much nicer -- I hit this with @appletreeisyellow when trying to downcast to a BinaryArray (having to know to put i32 was non obvious)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err... The i32 is just being inferred because of the : &BinaryArray type constraint...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking as a user here 🤔

Copy link
Contributor Author

@tustvold tustvold Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also as_bytes::<Utf8Type>() / as_bytes::<BinaryType>, etc... if that is more obvious for users? The as_binary and as_string are just short-cuts for the common case of only caring about one

Edit: Ultimately I'm not sure there is a way to abstract away caring about the offsets, if you're dealing with variable length types, be they strings, binary or lists, you need to type the offset...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily think the code needs to change. I think what helps the most is having some examples around that you can start from. So as always more docs / examples I think the better

let left = as_generic_binary_array::<i64>(left);
lt_eq_binary_scalar(left, right)
}
DataType::Binary => lt_eq_binary_scalar(left.as_binary::<i32>(), right),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sure looks nicer

/// ```
pub trait AsArray: private::Sealed {
/// Downcast this to a [`BooleanArray`] returning `None` if not possible
fn as_boolean_opt(&self) -> Option<&BooleanArray>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you have provided both _opt and non _opt variants so users can choose

impl private::Sealed for dyn Array + '_ {}
impl AsArray for dyn Array + '_ {
fn as_boolean_opt(&self) -> Option<&BooleanArray> {
self.as_any().downcast_ref()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very clever 👍

@tustvold
Copy link
Contributor Author

Mark as_XXX_array functions deprecated (or maybe make a note to do so in a later release)

I will make a note to do so in a future PR

@tustvold tustvold merged commit 8b7bfa6 into apache:master Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants