Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33569][SQL] Remove getting partitions by an identifier prefix. #30514

Closed

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 26, 2020

What changes were proposed in this pull request?

  1. Remove the method listPartitionIdentifiers() from the SupportsPartitionManagement interface. The method lists partitions by ident prefix.
  2. Rename listPartitionByNames() to listPartitionIdentifiers().
  3. Re-implement the default method partitionExists() using new method.

Why are the changes needed?

Getting partitions by ident prefix only is not used, and it can be removed to improve code maintenance. Also this makes the SupportsPartitionManagement interface cleaner.

Does this PR introduce any user-facing change?

Should not.

How was this patch tested?

By running the affected test suites:

$ build/sbt "test:testOnly org.apache.spark.sql.connector.catalog.*"

@github-actions github-actions bot added the SQL label Nov 26, 2020
@MaxGekk MaxGekk changed the title [SPARK-33569][SQL] Remove getting partitions by only identifiers [SPARK-33569][SQL] Remove getting partitions by an identifier prefix. Nov 26, 2020
@SparkQA
Copy link

SparkQA commented Nov 26, 2020

Test build #131838 has finished for PR 30514 at commit 99cd1c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* List the identifiers of all partitions that match to the ident by names.
*
* @param names the names of partition values in the identifier.
* @param ident a partition identifier values.
* @return an array of Identifiers for the partitions
*/
InternalRow[] listPartitionByNames(String[] names, InternalRow ident);
InternalRow[] listPartitionIdentifiers(String[] names, InternalRow ident);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about listPartitionIdentifiers(StructType column, InternalRow ident)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructType requires to set additional fields like nullable, metadata and etc. that could be not necessary to list partitions. For example, I use this method in https://github.com/apache/spark/pull/30398/files#diff-b8fcff1b4de2d6e3641609dc02db791af8c56c2660123fa2b4afd408597fb401R44 where names are coming from ResolvedPartitionSpec. With our proposal, I will have to extract StructType from the partition schema before calling listPartitionIdentifiers which is not convenient ( and necessary) from my point of view.

@@ -79,7 +80,9 @@ void createPartition(
* @return true if the partition exists, false otherwise
*/
default boolean partitionExists(InternalRow ident) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also changed this API. In the past, we coud use this API to check wether table had partitions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the changes, you can do the same, right?

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 27, 2020

@cloud-fan @HyukjinKwon @rdblue Could you review this PR, please.

@@ -79,7 +80,9 @@ void createPartition(
* @return true if the partition exists, false otherwise
*/
default boolean partitionExists(InternalRow ident) {
return listPartitionIdentifiers(ident).length > 0;
String[] partitionNames = partitionSchema().names();
String[] requiredNames = Arrays.copyOfRange(partitionNames, 0, ident.numFields());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we fail if the given ident is only a prefix? partitionExists sounds like it should be applied to a certain partition.

BTW which command needs it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses in tests a lot, and in:

I guess, we can rewrite the cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we always call it with a complete partition spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway it's not related to this PR.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6fd148f Nov 30, 2020
@LuciferYang
Copy link
Contributor

[error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala:44:38: value listPartitionByNames is not a member of org.apache.spark.sql.connector.catalog.SupportsPartitionManagement
[error]     val partitionIdentifiers = table.listPartitionByNames(names.toArray, ident)
[error]  

Are there missing codes? It seems that the latest code can't be compiled @cloud-fan @MaxGekk

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 30, 2020

Thank you, @LuciferYang . I am about to open a PR with a fix.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 30, 2020

Here is the fix for the build error: #30553

cloud-fan pushed a commit that referenced this pull request Nov 30, 2020
…artitionsExec`

### What changes were proposed in this pull request?
Use `listPartitionIdentifiers ` instead of `listPartitionByNames` in `ShowPartitionsExec`. The `listPartitionByNames` was renamed by #30514.

### Why are the changes needed?
To fix build error.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running tests for the `SHOW PARTITIONS` command:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowPartitionsSuite"
```

Closes #30553 from MaxGekk/fix-build-show-partitions-exec.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
a0x8o added a commit to a0x8o/spark that referenced this pull request Nov 30, 2020
…artitionsExec`

### What changes were proposed in this pull request?
Use `listPartitionIdentifiers ` instead of `listPartitionByNames` in `ShowPartitionsExec`. The `listPartitionByNames` was renamed by apache/spark#30514.

### Why are the changes needed?
To fix build error.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running tests for the `SHOW PARTITIONS` command:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowPartitionsSuite"
```

Closes #30553 from MaxGekk/fix-build-show-partitions-exec.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the remove-listPartitionIdentifiers branch February 19, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants