Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49152][SQL][FOLLOWUP] DelegatingCatalogExtension should also use V1 commands #47995

Closed
wants to merge 4 commits into from

Conversation

amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Sep 5, 2024

What changes were proposed in this pull request?

This is a followup of #47660 . If users override spark_catalog with
DelegatingCatalogExtension, we should still use v1 commands as DelegatingCatalogExtension forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS.

This PR also forces to use v1 commands for certain commands that do not have a v2 version.

Why are the changes needed?

Avoid introducing behavior changes to Spark plugins that implements DelegatingCatalogExtension to override spark_catalog.

Does this PR introduce any user-facing change?

No

How was this patch tested?

new test case

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 5, 2024
@amaliujia
Copy link
Contributor Author

@cloud-fan

@@ -284,7 +284,8 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager)
case AnalyzeColumn(ResolvedV1TableOrViewIdentifier(ident), columnNames, allColumns) =>
AnalyzeColumnCommand(ident, columnNames, allColumns)

case RepairTable(ResolvedV1TableIdentifier(ident), addPartitions, dropPartitions) =>
case RepairTable(
ResolvedTableIdentifierInSessionCatalog(ident), addPartitions, dropPartitions) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ResolvedTableIdentifierInSessionCatalog(ident), addPartitions, dropPartitions) =>
ResolvedTableIdentifierInSessionCatalog(ident), addPartitions, dropPartitions) =>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fix indentation

@@ -684,7 +693,8 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager)
}

private def supportsV1Command(catalog: CatalogPlugin): Boolean = {
isSessionCatalog(catalog) &&
SQLConf.get.getConf(SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION).isEmpty
(isSessionCatalog(catalog) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isSessionCatalog(catalog) should always be checked.

@cloud-fan
Copy link
Contributor

also cc @yaooqinn and @itholic (as the release manager)

val v2Catalog = catalog("spark_catalog").asTableCatalog
val table = v2Catalog.loadTable(Identifier.of(Array("default"), "tbl"))
assert(table.properties().get(TableCatalog.PROP_PROVIDER) == classOf[SimpleScanSource].getName)
val e = intercept[AnalysisException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in f7cfeb5 Sep 5, 2024
cloud-fan added a commit that referenced this pull request Sep 5, 2024
…se V1 commands

### What changes were proposed in this pull request?

This is a followup of #47660 . If users override `spark_catalog` with
`DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS.

This PR also forces to use v1 commands for certain commands that do not have a v2 version.

### Why are the changes needed?

Avoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

new test case

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47995 from amaliujia/fix_catalog_v2.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Rui Wang <rui.wang@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit f7cfeb5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@@ -71,4 +71,12 @@ class DataSourceV2SQLSessionCatalogSuite
sql(s"CREATE EXTERNAL TABLE t (i INT) USING $v2Format TBLPROPERTIES($prop)")
}
}

test("SPARK-49152: partition columns should be put at the end") {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case name sounds a little mismatched. Is this a correct reproducer for DelegatingCatalogExtension issue, @amaliujia ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Spark behavior with the built-in catalog. A DelegatingCatalogExtension shouldn't change it.

Strictly speaking, this shouldn't be the only issue, as there might be other subtle differences between v1 and v2 CREATE TABLE command, or other commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as @cloud-fan have mentioned, this verifies a specific case so we make sure the behavior is not changed. And there might be a list to verify though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It makes sense. Thank you for the explanation.

cloud-fan added a commit that referenced this pull request Sep 9, 2024
…be changed by falling back to v1 command

### What changes were proposed in this pull request?

This is a followup of #47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to #47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior.

### Why are the changes needed?

Behavior regression.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #48019 from amaliujia/regress_v2.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request Sep 9, 2024
…be changed by falling back to v1 command

This is a followup of #47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to #47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior.

Behavior regression.

No

UT

No

Closes #48019 from amaliujia/regress_v2.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 37b39b4)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants