Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47569][SQL] Disallow comparing variant. #45726

Closed
wants to merge 5 commits into from

Conversation

chenhao-db
Copy link
Contributor

What changes were proposed in this pull request?

It adds type-checking rules to disallow comparing variant values (including group by a variant column). We may support comparing variant values in the future, but since we don't have a proper comparison implementation at this point, they should be disallowed on the user surface.

How was this patch tested?

Unit tests.

@github-actions github-actions bot added the SQL label Mar 26, 2024
Copy link
Contributor

@gene-db gene-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenhao-db Thanks! I left a question.

@@ -192,4 +192,22 @@ class VariantSuite extends QueryTest with SharedSparkSession {
}
}
}

test("group/order/join variant are disabled") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests!

What about sort by? or window partition by, or window order by? Some examples:

select parse_json('') sort by 1

with t as (select 1 as a, parse_json('') as v)
select rank() over (partition by v order by a)

with t as (select 1 as a, parse_json('') as v)
select rank() over (partition by a order by v)

Copy link
Contributor Author

@chenhao-db chenhao-db Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point. Actually, the second statement (partition by v) will not fail as expected, and it is really an error in the type check for Window. Both partitionSpec and orderSpec must be orderable, but the current type check only includes orderSpec. If the partitionSpec contains a map type, the query will fail later in an optimizer rule with INTERNAL_ERROR. I will create another PR to fix the type check for Window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can merge #45730 before this PR, we can add the second statement to the tests.

Copy link
Contributor

@gene-db gene-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenhao-db Thanks for fixing and the test cases!

LGTM

@github-actions github-actions bot added the DOCS label Mar 27, 2024
@chenhao-db
Copy link
Contributor Author

@cloud-fan Could you help review this PR? Thanks!

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in cf02b1a Apr 1, 2024
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

It adds type-checking rules to disallow comparing variant values (including group by a variant column). We may support comparing variant values in the future, but since we don't have a proper comparison implementation at this point, they should be disallowed on the user surface.

### How was this patch tested?

Unit tests.

Closes apache#45726 from chenhao-db/SPARK-47569.

Authored-by: Chenhao Li <chenhao.li@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants