-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47007][SQL] Add the MapSort
expression
#45639
Conversation
…sSuite.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
…sSuite.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
* ascendingOrder - A boolean value describing the order in which the map will be sorted. | ||
This can be either be ascending (true) or descending (false). | ||
""", | ||
examples = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably don't need this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed ExpressionDescription
.
| Object $o1 = (($simpleEntryType) $o1entry).getKey(); | ||
| Object $o2 = (($simpleEntryType) $o2entry).getKey(); | ||
| $comp; | ||
| return $order ? $c : -$c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for just seeing this, but maybe we should do here the same thing that ArraySort
does which is put the ordering into a variable outside of compare and just multiply it with the result?
this way we avoid branching in every comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if tests pass!
@@ -888,6 +888,158 @@ case class MapFromEntries(child: Expression) | |||
copy(child = newChild) | |||
} | |||
|
|||
case class MapSort(base: Expression, ascendingOrder: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case class MapSort(base: Expression, ascendingOrder: Expression) | |
case class MapSort(base: Expression, ascendingOrder: Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the BinaryExpression
require two expressions here? Do we demote this to UnaryExpression?
EDIT: Expression
for ascendingOrder
in array sorting has been set as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's is the internal use-cases for the expression? Do we need this parameter at all?
Seems like you are going to pass true
as ascendingOrder
always at
https://github.com/apache/spark/pull/45549/files#diff-11264d807efa58054cca2d220aae8fba644ee0f0f2a4722c46d52828394846efR2488
case a @ Aggregate(groupingExpr, x, b) =>
val newGrouping = groupingExpr.map { expr =>
(expr, expr.dataType) match {
case (_: MapSort, _) => expr
case (_, _: MapType) =>
MapSort(expr, Literal.TrueLiteral)
case _ => expr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the point of internal use, we don't need it. Refactored expression as UnaryExpression
and removed ordering altogether.
|""".stripMargin | ||
} | ||
|
||
override def prettyName: String = "map_sort" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this since the expression hasn't been bound to the function name.
@@ -888,6 +888,158 @@ case class MapFromEntries(child: Expression) | |||
copy(child = newChild) | |||
} | |||
|
|||
case class MapSort(base: Expression, ascendingOrder: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's is the internal use-cases for the expression? Do we need this parameter at all?
Seems like you are going to pass true
as ascendingOrder
always at
https://github.com/apache/spark/pull/45549/files#diff-11264d807efa58054cca2d220aae8fba644ee0f0f2a4722c46d52828394846efR2488
case a @ Aggregate(groupingExpr, x, b) =>
val newGrouping = groupingExpr.map { expr =>
(expr, expr.dataType) match {
case (_: MapSort, _) => expr
case (_, _: MapType) =>
MapSort(expr, Literal.TrueLiteral)
case _ => expr
case Literal(_: Boolean, BooleanType) => | ||
TypeCheckResult.TypeCheckSuccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have to be so strict on this argument? For example, could you imagine a case where you want to select the sort order based on the values in another column or the result of an expression? Is this needlessly restrictive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, could you imagine a case where you want to select the sort order based on the values in another column or the result of an expression?
I can imagine the case but so far we are going to use the expression internally for one case only. Support of ascendingOrder = false
or even an arbitrary boolean expression just overcomplicates the code.
MapSort
expression
+1, LGTM. Merging to master. |
### What changes were proposed in this pull request? Changes proposed in this PR include: - Relaxed checks that prevent aggregating of map types - Added new analyzer rule that uses `MapSort` expression proposed in [this PR](#45639) - Created codegen that compares two sorted maps ### Why are the changes needed? Adding new functionality to GROUP BY map types ### Does this PR introduce _any_ user-facing change? Yes, ability to use `GROUP BY MapType` ### How was this patch tested? With new UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes #45549 from stevomitric/stevomitric/map-group-by. Lead-authored-by: Stevo Mitric <stevo.mitric@databricks.com> Co-authored-by: Stefan Kandic <stefan.kandic@databricks.com> Co-authored-by: Stevo Mitric <stevomitric2000@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Added the new `MapSort` expression in `CollationOperations` alongside new UTs. ### Why are the changes needed? In order to add the ability to do GROUP BY on map types we first have to be able to sort the maps by their key ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New tests in `CollectionExpressionSuite` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45639 from stevomitric/stevomitric/map-expr. Lead-authored-by: Stevo Mitric <stevo.mitric@databricks.com> Co-authored-by: Stefan Kandic <stefan.kandic@databricks.com> Co-authored-by: Stevo Mitric <stevomitric2000@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? Changes proposed in this PR include: - Relaxed checks that prevent aggregating of map types - Added new analyzer rule that uses `MapSort` expression proposed in [this PR](apache#45639) - Created codegen that compares two sorted maps ### Why are the changes needed? Adding new functionality to GROUP BY map types ### Does this PR introduce _any_ user-facing change? Yes, ability to use `GROUP BY MapType` ### How was this patch tested? With new UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45549 from stevomitric/stevomitric/map-group-by. Lead-authored-by: Stevo Mitric <stevo.mitric@databricks.com> Co-authored-by: Stefan Kandic <stefan.kandic@databricks.com> Co-authored-by: Stevo Mitric <stevomitric2000@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Added the new
MapSort
expression inCollationOperations
alongside new UTs.Why are the changes needed?
In order to add the ability to do GROUP BY on map types we first have to be able to sort the maps by their key
Does this PR introduce any user-facing change?
No
How was this patch tested?
New tests in
CollectionExpressionSuite
Was this patch authored or co-authored using generative AI tooling?
No