Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AggregateNode::isPreGrouped() API #7763

Closed
wants to merge 1 commit into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Nov 28, 2023

Add AggregateNode::isPreGrouped() convenience API to easily tell whether
aggregation can be executed in streaming mode.

Also add "STREAMING" label to the output of AggregateNode.toString() if
applicable.

Copy link

netlify bot commented Nov 28, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ce45aab
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6565a0ec7c5a020009a27bbb

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 28, 2023
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you Thanks. Curious, what was the motivation for this improvement.

@mbasmanova mbasmanova changed the title Add isPreGrouped for AggregateNode Add AggregateNode::isPreGrouped() API Nov 28, 2023
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ulysses-you
Copy link
Contributor Author

thank you @mbasmanova for the review. Mostly, it is used to help Gluten integrate Velox. e.g., make sure the streaming aggregate is planned as expected, show detailed information about the native plan tree string.

@mbasmanova
Copy link
Contributor

@ulysses-you Got it. Thank you for clarifying.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 93478ae.

Copy link

Conbench analyzed the 1 benchmark run on commit 93478ae9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@ulysses-you ulysses-you deleted the pregrouped branch November 29, 2023 01:38
@@ -573,6 +573,19 @@ class AggregationNode : public PlanNode {
return preGroupedKeys_;
}

bool isPreGrouped() const {
return !preGroupedKeys_.empty() &&
std::equal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not semantically equal to aggregationNode->preGroupedKeys().size() == aggregationNode->groupingKeys().size(), for example if they have different orders, previous returns true but this one returns false. Not sure if this can happen. cc @ulysses-you @mbasmanova

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waitinfuture This is a good point. preGroupedKeys should be a subset of groupingKeys, but there is not requirements for the order of keys in preGroupedKeys to match the order of keys in groupingKeys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it depends on how to develop with this interface. If the developer knows the input data are pre-grouped, then just set all grouping keys as preGroupedKeys (that's what Gluten did). Before, it is a bit flaky by checking grouping key size, e.g., developer may set atrribute which is not a grouping key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the safe way is to check every key in preGroupedKeys_ is in groupingKeys_, but not necessarily with the same position.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, a better way is to require that preGroupedKeys is a subset of groupingKeys and keys appear in the same order. We can add a check to the constructor to enforce that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants