Comparison for nested type #5407

jayzhan211 · 2024-02-18T12:47:53Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I want to be able to compare ListArray. The comparison order is left to right and if the prefix is the same then it is considered.
lesser.

For example,

List(1,2,3) < List(1,2,3,4)
List(List(1,2,3), List(10)) < List(List(1,2,4), List(1))
List(List(1,2,3), List(1)) > List(List(1,2,2), List(2))

The behavior takes Duckdb's array sort as a reference.

Describe the solution you'd like

Support nested types in compare_op, currently failed on this line.

    if l_t != r_t || l_t.is_nested() {
        return Err(ArrowError::InvalidArgumentError(format!(
            "Invalid comparison operation: {l_t} {op} {r_t}"
        )));
    }

Describe alternatives you've considered

Additional context

We can support most of the ops in the future, lt and eq are two that I would like to work on first

The text was updated successfully, but these errors were encountered:

tustvold · 2024-06-23T09:23:06Z

I'm not sure we want to support this within these kernels, in particular the null behaviour is not well-defined. I added support to the comparator for this, which will allow systems like DataFusion to define an ordering for nulls based on a config setting, as Spark and Postgres use different ordering.

There is an example of this here - https://docs.rs/arrow-ord/latest/arrow_ord/ord/fn.make_comparator.html#postgres-compatible-nested-comparison

This will also make it more obvious that the nested comparison is not vectorised

Edit: Filed #5942 to make this more discoverable

jayzhan211 · 2024-06-23T11:41:06Z

I'm not sure we want to support this within these kernels, in particular the null behaviour is not well-defined. I added support to the comparator for this, which will allow systems like DataFusion to define an ordering for nulls based on a config setting, as Spark and Postgres use different ordering.

There is an example of this here - https://docs.rs/arrow-ord/latest/arrow_ord/ord/fn.make_comparator.html#postgres-compatible-nested-comparison

This will also make it more obvious that the nested comparison is not vectorised

Edit: Filed #5942 to make this more discoverable

Are you suggesting that if datafusion requires nested comparison, we could implement them in datafusion with make_comparator like what I had done? Arrow-rs only support the make_comparator kernel for nested type?

https://github.com/apache/datafusion/blob/d32747d09add5e1c670aa32fbe3294ecee15e3b7/datafusion/physical-expr/src/expressions/datum.rs#L28-L58

This is the code that utilize compare_op, and nested type is missing

tustvold · 2024-06-23T12:29:23Z

Yes, DataFusion should decide what null semantics it wants / make this configurable, and implement it using make_comparator if appropriate

jayzhan211 added the enhancement Any new improvement worthy of a entry in the changelog label Feb 18, 2024

jayzhan211 self-assigned this Feb 19, 2024

This was referenced Feb 19, 2024

arrow-ord: lt and eq for nested list #5408

Closed

arrow-ord: Comparison for struct #5411

Open

Blizzara mentioned this issue Jun 10, 2024

Support comparison operators on nested data types (Struct, List, ..) apache/datafusion#10856

Open

jayzhan211 mentioned this issue Jun 23, 2024

Basic comparison for List #5941

Closed

tustvold mentioned this issue Jun 23, 2024

Better document support for nested comparison #5942

Merged

tustvold closed this as completed in #5942 Jun 24, 2024

jayzhan211 mentioned this issue Jun 26, 2024

Implement eq comparison for StructArray #5960

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison for nested type #5407

Comparison for nested type #5407

jayzhan211 commented Feb 18, 2024 •

edited

Loading

tustvold commented Jun 23, 2024 •

edited

Loading

jayzhan211 commented Jun 23, 2024 •

edited

Loading

tustvold commented Jun 23, 2024

Comparison for nested type #5407

Comparison for nested type #5407

Comments

jayzhan211 commented Feb 18, 2024 • edited Loading

tustvold commented Jun 23, 2024 • edited Loading

jayzhan211 commented Jun 23, 2024 • edited Loading

tustvold commented Jun 23, 2024

jayzhan211 commented Feb 18, 2024 •

edited

Loading

tustvold commented Jun 23, 2024 •

edited

Loading

jayzhan211 commented Jun 23, 2024 •

edited

Loading