-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer. #12590
Conversation
Test build #56613 has finished for PR 12590 at commit
|
Rebased. |
Test build #56772 has finished for PR 12590 at commit
|
Hi, @rxin . |
Rebased. |
Test build #56891 has finished for PR 12590 at commit
|
Hi, @rxin . |
Test build #56970 has finished for PR 12590 at commit
|
Hi, @marmbrus . |
Test build #57274 has finished for PR 12590 at commit
|
Test build #57411 has finished for PR 12590 at commit
|
object RemoveRepetitionFromGroupExpressions extends Rule[LogicalPlan] { | ||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
case a @ Aggregate(grouping, _, _) => | ||
val newGrouping = grouping.distinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to miss cases like GROUP BY A, a
. I think you want to use an ExpressionSet
instead of normal .distinct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thank you for review. @marmbrus .
I'll fix that.
Test build #57540 has finished for PR 12590 at commit
|
@marmbrus . Now, it's ready for review again. |
Thanks, merging to master and 2.0 |
## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, (1 + a#0)#7, (A#0 + 1)#8, (1 + A#0)#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6,(1 + a#0) AS (1 + a#0)#7,(A#0 + 1) AS (A#0 + 1)#8,(1 + A#0) AS (1 + A#0)#9], functions=[], output=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6], functions=[], output=[(a#0 + 1)#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12590 from dongjoon-hyun/SPARK-14830. (cherry picked from commit 6e63201) Signed-off-by: Michael Armbrust <michael@databricks.com>
Thank you, @marmbrus ! |
What changes were proposed in this pull request?
This PR aims to optimize GroupExpressions by removing repeating expressions.
RemoveRepetitionFromGroupExpressions
is added.Before
After
How was this patch tested?
Pass the Jenkins tests (with a new testcase)