Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Add multi ordering test for array agg order #8439

Merged
merged 1 commit into from
Dec 11, 2023

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Dec 6, 2023

Which issue does this PR close?

Rationale for this change

I found the reason why we need level 3 vec in convert_array_agg_to_orderings https://github.com/apache/arrow-datafusion/pull/7629/files#r1357637090. For multi ordering cases, we have len > 1.

Add multi ordering test to improve the test coverage, also simply the code.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Dec 6, 2023
7,30,6
8,30,7
9,30,8
10,10,9
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we need csv test is that only csv file trigger merge_batch, the normal table go to update_batch only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal table

logical_plan
Aggregate: groupBy=[[]], aggr=[[ARRAY_AGG(arrays.column1) ORDER BY [arrays.column1 ASC NULLS LAST]]]
--TableScan: arrays projection=[column1]
physical_plan
AggregateExec: mode=Single, gby=[], aggr=[ARRAY_AGG(arrays.column1)]
--SortExec: expr=[column1@0 ASC NULLS LAST]
----MemoryExec: partitions=1, partition_sizes=[1]

AggregateExec: mode is Single, so run update_batch.

partition_ordering_values.push(v);
// Get the inner struct array
let ordering_array = agg_orderings.value(0);
let orderings = ScalarValue::convert_array_to_scalar_vec(&ordering_array)?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we always have ListArray(StructArray) for agg_orderings, so we can safely get orderings with convert_array_to_scalar_vec only.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 6, 2023
Aggregate: groupBy=[[]], aggr=[[ARRAY_AGG(agg_order.c1) ORDER BY [agg_order.c2 DESC NULLS FIRST, agg_order.c3 ASC NULLS LAST]]]
--TableScan: agg_order projection=[c1, c2, c3]
physical_plan
AggregateExec: mode=Final, gby=[], aggr=[ARRAY_AGG(agg_order.c1)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AggregateExec mode is final, so run merge_batch

@@ -329,6 +319,8 @@ impl OrderSensitiveArrayAggAccumulator {
.collect();
let struct_type = DataType::Struct(Fields::from(fields));

// TODO: Since ScalarValue::Struct is not yet includes StructArray,
// so we need to convert it to ListArray for ScalarValue::List
let arr = ScalarValue::new_list(&orderings, &struct_type);
Ok(ScalarValue::List(arr))

This comment was marked as outdated.

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 force-pushed the array_agg_order_improve_doc_and_test branch from 22f3288 to 984551e Compare December 7, 2023 01:23
@jayzhan211 jayzhan211 marked this pull request as ready for review December 7, 2023 01:24
for v in other_ordering_values.into_iter() {
partition_ordering_values.push(v);

for partition_ordering_rows in orderings.into_iter() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think convert_array_agg_to_orderings is doing quite specific thing, so remove the function and do things here.

@jayzhan211 jayzhan211 changed the title Add multi ordering test for array agg order Minor: Add multi ordering test for array agg order Dec 7, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks reasonable to me. Thank you @jayzhan211

I wonder if @mustafasrepo could give this a look as I think he contributed the original implementation of this aggregate

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is LGTM!. Thanks @jayzhan211 for this PR.

@alamb alamb merged commit ff65dee into apache:main Dec 11, 2023
23 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 11, 2023

Thanks again

appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 3, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 3, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants