Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor aggregate function handling #8358

Merged
merged 8 commits into from
Nov 30, 2023
Merged

Conversation

Weijun-H
Copy link
Member

Which issue does this PR close?

Closes #8346

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate substrait labels Nov 29, 2023
@Weijun-H Weijun-H force-pushed the udf-aggregation branch 2 times, most recently from cb469eb to 95a868f Compare November 29, 2023 15:47
@Weijun-H Weijun-H marked this pull request as ready for review November 29, 2023 16:09
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Weijun-H -- this looks great. I had some code style suggestion that could reduce duplication but I don't think they are necessary as the duplication already exists and we could potentially do it as a follow on (though this PR would be stronger 🦾 if they were done too)

It looks like there are some conflicts to resolve

But all in all, great work and thank you

Comment on lines 254 to 257
let mut names = Vec::with_capacity(args.len());
for e in args {
names.push(create_physical_name(e, false)?);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is the same as was here, but I think you can write this more concisely like

Suggested change
let mut names = Vec::with_capacity(args.len());
for e in args {
names.push(create_physical_name(e, false)?);
}
let names = args.iter()
.map(|e| create_physical_name(e, false))
.collect::<Result<Vec<_>>()?

)?),
None => None,
};
let order_by = match order_by {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be possible to do the order_by and filter conversions once rather than for both arms

@@ -477,11 +475,36 @@ impl Sort {
}
}

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
/// Defines which implementation of a function for DataFusion to call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Defines which implementation of a function for DataFusion to call.
/// Defines which implementation of an aggregate function DataFusion should call.

fun: aggregate_function::AggregateFunction,
name: Arc<str>,
},
/// Resolved to a user defined function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Resolved to a user defined function
/// Resolved to a user defined aggregate function

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
}) => match func_def {
AggregateFunctionDefinition::BuiltIn { fun: _, name: _ }
| AggregateFunctionDefinition::Name(_) => {
let mut name = create_function_name(func_def.name(), *distinct, args)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to combine the arms together here (the AggregateFunctionDefinition::UDF appears to do the same thing, though I realize it is not entirely the same)

Expr::AggregateFunction(AggregateFunction { func_def, args, .. }) => {
match func_def {
AggregateFunctionDefinition::BuiltIn { fun, .. } => {
let data_types = args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data_types calculation could be hoisted above as well

@alamb
Copy link
Contributor

alamb commented Nov 29, 2023

@Weijun-H could you also apply the change from #8365 to this PR as well (no need to store name for the built in variant)?

@Weijun-H Weijun-H requested a review from alamb November 29, 2023 21:33
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- thanks again @Weijun-H

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
/// Defines which implementation of an aggregate function DataFusion should call.
pub enum AggregateFunctionDefinition {
BuiltIn(aggregate_function::AggregateFunction),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb added the api change Changes the API exposed to users of the crate label Nov 29, 2023
@alamb
Copy link
Contributor

alamb commented Nov 29, 2023

Since this is an API change I will leave it open for another day before merging. Really nice work @Weijun-H

@alamb alamb merged commit a49740f into apache:main Nov 30, 2023
22 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 30, 2023

Thanks again @Weijun-H -- this is really nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify Expr::AggregateFunction and Expr::AggregateUDF
2 participants