Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix comparison of dictionary arrays #1606

Merged
merged 2 commits into from
Jan 19, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 18, 2022

Which issue does this PR close?

Closes #1605

See ticket for more details

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Jan 18, 2022
assert_batches_eq!(expected, &actual);

// comparison with another dictionary column
let sql = "SELECT d1 FROM test WHERE d1 = d2";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test fails without the change

@@ -77,7 +77,7 @@ pub(crate) fn coerce_types(
}

fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
if lhs_type == rhs_type {
if lhs_type == rhs_type && !is_dictionary(lhs_type) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the arrow eq kernels don't support DictionaryArray <==> DictionaryArray comparisons yet, so we need to handle this case specially. I will also file a ticket to add native DictionaryArray comparison support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -147,6 +147,10 @@ pub fn temporal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<Dat
}
}

pub(crate) fn is_dictionary(t: &DataType) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication of code between this module and binary_rule.rs is not good -- I am going to try and consolidate it as a follow on PR cc @liukun4515

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1607 <-- doesn't remove duplication but it at least consolidates the logic into a single module

@@ -77,7 +77,7 @@ pub(crate) fn coerce_types(
}

fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
if lhs_type == rhs_type {
if lhs_type == rhs_type && !is_dictionary(lhs_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add TODO or some comments for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do so -- it is a good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in e27110f

@@ -90,7 +90,7 @@ fn comparison_order_coercion(
lhs_type: &DataType,
rhs_type: &DataType,
) -> Option<DataType> {
if lhs_type == rhs_type {
if lhs_type == rhs_type && !is_dictionary(lhs_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's same with above.

@@ -147,6 +147,10 @@ pub fn temporal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<Dat
}
}

pub(crate) fn is_dictionary(t: &DataType) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another point I want to point out is that the arrow-rs has some codes to check data type, and the logic is also in datafusion, for example, https://docs.rs/arrow/7.0.0/arrow/datatypes/enum.DataType.html#method.is_numeric, Maybe we should refactor them and make the codebase clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #1613

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks @alamb

@alamb alamb merged commit ad392fd into apache:master Jan 19, 2022
@alamb alamb deleted the alamb/compare_dict_arrays branch January 19, 2022 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data type Dictionary(Int32, Utf8) not supported for binary operation 'eq' on dyn arrays
3 participants