-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AggregateNode::isPreGrouped() API #7763
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you Thanks. Curious, what was the motivation for this improvement.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
thank you @mbasmanova for the review. Mostly, it is used to help Gluten integrate Velox. e.g., make sure the streaming aggregate is planned as expected, show detailed information about the native plan tree string. |
@ulysses-you Got it. Thank you for clarifying. |
@mbasmanova merged this pull request in 93478ae. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
@@ -573,6 +573,19 @@ class AggregationNode : public PlanNode { | |||
return preGroupedKeys_; | |||
} | |||
|
|||
bool isPreGrouped() const { | |||
return !preGroupedKeys_.empty() && | |||
std::equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not semantically equal to aggregationNode->preGroupedKeys().size() == aggregationNode->groupingKeys().size()
, for example if they have different orders, previous returns true but this one returns false. Not sure if this can happen. cc @ulysses-you @mbasmanova
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waitinfuture This is a good point. preGroupedKeys should be a subset of groupingKeys, but there is not requirements for the order of keys in preGroupedKeys to match the order of keys in groupingKeys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it depends on how to develop with this interface. If the developer knows the input data are pre-grouped, then just set all grouping keys as preGroupedKeys (that's what Gluten did). Before, it is a bit flaky by checking grouping key size, e.g., developer may set atrribute which is not a grouping key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the safe way is to check every key in preGroupedKeys_
is in groupingKeys_
, but not necessarily with the same position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, a better way is to require that preGroupedKeys is a subset of groupingKeys and keys appear in the same order. We can add a check to the constructor to enforce that.
Add AggregateNode::isPreGrouped() convenience API to easily tell whether
aggregation can be executed in streaming mode.
Also add "STREAMING" label to the output of AggregateNode.toString() if
applicable.