Skip to content

Commit

Permalink
[SPARK-35014] Fix the PhysicalAggregation pattern to not rewrite fold…
Browse files Browse the repository at this point in the history
…able expressions

### What changes were proposed in this pull request?

Fix PhysicalAggregation to not transform a foldable expression.

### Why are the changes needed?

It can potentially break certain queries like the added unit test shows.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes undesirable errors caused by a returned TypeCheckFailure from places like RegExpReplace.checkInputDataTypes.

Closes apache#32113 from sigmod/foldable.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
sigmod authored and cloud-fan committed Apr 13, 2021
1 parent 49618c9 commit 9cd25b4
Show file tree
Hide file tree
Showing 2 changed files with 55 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ object PhysicalAggregation {
case ue: PythonUDF if PythonUDF.isGroupedAggPandasUDF(ue) =>
equivalentAggregateExpressions.getEquivalentExprs(ue).headOption
.getOrElse(ue).asInstanceOf[PythonUDF].resultAttribute
case expression =>
case expression if !expression.foldable =>
// Since we're using `namedGroupingAttributes` to extract the grouping key
// columns, we need to replace grouping key expressions with their corresponding
// attributes. We do not rely on the equality check at here since attributes may
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.util

import org.apache.spark.sql.catalyst.analysis.SimpleAnalyzer
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.planning.PhysicalAggregation
import org.apache.spark.sql.catalyst.plans.PlanTest
import org.apache.spark.sql.catalyst.plans.logical.LocalRelation

class PhysicalAggregationSuite extends PlanTest {
val testRelation = LocalRelation('a.int, 'b.int)

test("SPARK-35014: a foldable expression should not be replaced by an AttributeReference") {
val query = testRelation
.groupBy('a, Literal.create(1) as 'k)(
'a, Round(Literal.create(1.2), Literal.create(1)) as 'r, count('b) as 'c)
val analyzedQuery = SimpleAnalyzer.execute(query)

val PhysicalAggregation(
groupingExpressions,
aggregateExpressions,
resultExpressions,
_ /* child */
) = analyzedQuery

assertResult(2)(groupingExpressions.length)
assertResult(1)(aggregateExpressions.length)
assertResult(3)(resultExpressions.length)

// Verify that Round's scale parameter is a Literal.
resultExpressions(1) match {
case Alias(Round(_, _: Literal), _) =>
case other => fail("unexpected result expression: " + other)
}
}
}

0 comments on commit 9cd25b4

Please sign in to comment.