Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47569][SQL] Disallow comparing variant. #45726

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,15 @@
*/
package org.apache.spark.sql.catalyst.expressions

import org.apache.spark.sql.types.{ArrayType, AtomicType, DataType, NullType, StructType, UserDefinedType}
import org.apache.spark.sql.types.{ArrayType, AtomicType, DataType, NullType, StructType, UserDefinedType, VariantType}

object OrderUtils {
/**
* Returns true iff the data type can be ordered (i.e. can be sorted).
*/
def isOrderable(dataType: DataType): Boolean = dataType match {
case NullType => true
case VariantType => false
case dt: AtomicType => true
case struct: StructType => struct.fields.forall(f => isOrderable(f.dataType))
case array: ArrayType => isOrderable(array.elementType)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
import org.apache.spark.sql.catalyst.plans.logical.Aggregate
import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, CharVarcharUtils}
import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryErrorsBase, QueryExecutionErrors}
import org.apache.spark.sql.types.{DataType, MapType, StringType, StructType}
import org.apache.spark.sql.types.{DataType, MapType, StringType, StructType, VariantType}
import org.apache.spark.unsafe.types.UTF8String

object ExprUtils extends QueryErrorsBase {
Expand Down Expand Up @@ -194,7 +194,8 @@ object ExprUtils extends QueryErrorsBase {
}

// Check if the data type of expr is orderable.
if (expr.dataType.existsRecursively(_.isInstanceOf[MapType])) {
if (expr.dataType.existsRecursively(
t => t.isInstanceOf[MapType] || t.isInstanceOf[VariantType])) {
expr.failAnalysis(
errorClass = "GROUP_EXPRESSION_TYPE_IS_NOT_ORDERABLE",
messageParameters = Map(
Expand Down
29 changes: 29 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -192,4 +192,33 @@ class VariantSuite extends QueryTest with SharedSparkSession {
}
}
}

test("group/order/join variant are disabled") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests!

What about sort by? or window partition by, or window order by? Some examples:

select parse_json('') sort by 1

with t as (select 1 as a, parse_json('') as v)
select rank() over (partition by v order by a)

with t as (select 1 as a, parse_json('') as v)
select rank() over (partition by a order by v)

Copy link
Contributor Author

@chenhao-db chenhao-db Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point. Actually, the second statement (partition by v) will not fail as expected, and it is really an error in the type check for Window. Both partitionSpec and orderSpec must be orderable, but the current type check only includes orderSpec. If the partitionSpec contains a map type, the query will fail later in an optimizer rule with INTERNAL_ERROR. I will create another PR to fix the type check for Window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can merge #45730 before this PR, we can add the second statement to the tests.

var ex = intercept[AnalysisException] {
spark.sql("select parse_json('') group by 1")
}
assert(ex.getErrorClass == "GROUP_EXPRESSION_TYPE_IS_NOT_ORDERABLE")

ex = intercept[AnalysisException] {
spark.sql("select parse_json('') order by 1")
}
assert(ex.getErrorClass == "DATATYPE_MISMATCH.INVALID_ORDERING_TYPE")

ex = intercept[AnalysisException] {
spark.sql("select parse_json('') sort by 1")
}
assert(ex.getErrorClass == "DATATYPE_MISMATCH.INVALID_ORDERING_TYPE")

ex = intercept[AnalysisException] {
spark.sql("with t as (select 1 as a, parse_json('') as v) " +
"select rank() over (partition by a order by v) from t")
}
assert(ex.getErrorClass == "DATATYPE_MISMATCH.INVALID_ORDERING_TYPE")

ex = intercept[AnalysisException] {
spark.sql("with t as (select parse_json('') as v) " +
"select t1.v from t as t1 join t as t2 on t1.v = t2.v")
}
assert(ex.getErrorClass == "DATATYPE_MISMATCH.INVALID_ORDERING_TYPE")
}
}