Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arithmetic scalar operation with DictionaryArray #5151

Merged
merged 3 commits into from
Feb 3, 2023

Conversation

viirya
Copy link
Member

@viirya viirya commented Feb 1, 2023

Which issue does this PR close?

Closes #5150.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Feb 1, 2023
@alamb
Copy link
Contributor

alamb commented Feb 1, 2023

Thanks @viirya -- it is great to see someone else using Dictionary arrays ❤️

Looks like there are some compilation errors on this PR. Marking as draft until those are resolved.

error[E0425]: cannot find function `divide_decimal_scalar` in this scope
   --> datafusion/physical-expr/src/expressions/binary/kernels_arrow.rs:465:19
    |
[231](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:232) | / pub(crate) fn divide_decimal_dyn_scalar(
[232](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:233) | |     left: &dyn Array,
[233](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:234) | |     right: i128,
[234](https://github.com/apache/arrow-datafusion/actions/runs/4068186587/jobs/7006440910#step:8:235) | | ) -> Result<Decimal128Array> {
...   |
242 | |     Ok(decimal_array)
243 | | }
    | |_- similarly named function `divide_decimal_dyn_scalar` defined here
...
465 |           let err = divide_decimal_scalar(&left_decimal_array, 0).unwrap_err();
    |                     ^^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `divide_decimal_dyn_scalar`

@alamb alamb marked this pull request as draft February 1, 2023 22:16
@viirya
Copy link
Member Author

viirya commented Feb 1, 2023

Thanks @alamb . Yea, I'm trying to fix them locally.

@viirya viirya force-pushed the arithmetic_dyn_scalar branch 2 times, most recently from e8669d4 to 699d7c2 Compare February 1, 2023 23:07
@viirya viirya marked this pull request as ready for review February 1, 2023 23:57
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- it would be great to eventually be able to write sql level tests for this feature, but there is no good way to make dictionary arrays yet from SQL (but I am hoping to get #5016 which would allow it).

Thank you for the contribution @viirya

cc @liukun4515 as this also impacts Decimals (though I don't expect any behavior change)

(DataType::Dictionary(_, value_type), _) => {
is_numeric(value_type) && is_numeric(rhs_type)
}
(_, DataType::Dictionary(_, value_type)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this need to handle the case when both lhs and rhs are Dictionaries 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm wondering what Dictionary for ScalaValue(rhs here) means although I have seen DataFusion has it.

It sounds reasonable to add the following:

(DataType::Dictionary(_, lhs_value_type), DataType::Dictionary(_, rhs_value_type2)) => {
  is_numeric(lhs_value_type) && is_numeric(rhs_value_type2)
}

datafusion/physical-expr/src/expressions/binary.rs Outdated Show resolved Hide resolved

let mut dict_builder = PrimitiveDictionaryBuilder::<Int8Type, Int32Type>::new();

dict_builder.append(2)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
Operator::Modulo => {
// todo: change to binary_primitive_array_op_dyn_scalar! once modulo is implemented
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo change: apache/arrow-rs#3649

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove binary_primitive_array_op_scalar once this is done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, once modulo is ready to do.

@viirya
Copy link
Member Author

viirya commented Feb 2, 2023

Looks good to me -- it would be great to eventually be able to write sql level tests for this feature, but there is no good way to make dictionary arrays yet from SQL (but I am hoping to get #5016 which would allow it).

Thank you @alamb. Yea, we have internal e2e tests that use dictionary arrays but seems I don't find similar stuff on DataFusion so far. We can definitely expand test coverage for this in the future.

@viirya
Copy link
Member Author

viirya commented Feb 2, 2023

BTW, I think array op array is also not supported yet on dictionary arrays. I will work on it after this.

datafusion/physical-expr/src/expressions/binary.rs Outdated Show resolved Hide resolved
macro_rules! binary_primitive_array_op_dyn_scalar {
($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
// unwrap underlying (non dictionary) value
let right = unwrap_dict_value($RIGHT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that we have "scalar dictionary":

    /// Dictionary type: index type and value
    Dictionary(Box<DataType>, Box<ScalarValue>),

wonder how useful this is ...

}
Operator::Modulo => {
// todo: change to binary_primitive_array_op_dyn_scalar! once modulo is implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove binary_primitive_array_op_scalar once this is done?

datafusion/physical-expr/src/expressions/binary.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liukun4515 liukun4515 merged commit c759865 into apache:master Feb 3, 2023
@ursabot
Copy link

ursabot commented Feb 3, 2023

Benchmark runs are scheduled for baseline = 224c682 and contender = c759865. c759865 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arithmetic scalar operation doesn't work with DictionaryArray
5 participants