-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33569][SQL] Remove getting partitions by an identifier prefix. #30514
Conversation
Test build #131838 has finished for PR 30514 at commit
|
/** | ||
* List the identifiers of all partitions that match to the ident by names. | ||
* | ||
* @param names the names of partition values in the identifier. | ||
* @param ident a partition identifier values. | ||
* @return an array of Identifiers for the partitions | ||
*/ | ||
InternalRow[] listPartitionByNames(String[] names, InternalRow ident); | ||
InternalRow[] listPartitionIdentifiers(String[] names, InternalRow ident); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about listPartitionIdentifiers(StructType column, InternalRow ident)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StructType
requires to set additional fields like nullable
, metadata
and etc. that could be not necessary to list partitions. For example, I use this method in https://github.com/apache/spark/pull/30398/files#diff-b8fcff1b4de2d6e3641609dc02db791af8c56c2660123fa2b4afd408597fb401R44 where names are coming from ResolvedPartitionSpec
. With our proposal, I will have to extract StructType
from the partition schema before calling listPartitionIdentifiers
which is not convenient ( and necessary) from my point of view.
@@ -79,7 +80,9 @@ void createPartition( | |||
* @return true if the partition exists, false otherwise | |||
*/ | |||
default boolean partitionExists(InternalRow ident) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also changed this API. In the past, we coud use this API to check wether table had partitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the changes, you can do the same, right?
@cloud-fan @HyukjinKwon @rdblue Could you review this PR, please. |
@@ -79,7 +80,9 @@ void createPartition( | |||
* @return true if the partition exists, false otherwise | |||
*/ | |||
default boolean partitionExists(InternalRow ident) { | |||
return listPartitionIdentifiers(ident).length > 0; | |||
String[] partitionNames = partitionSchema().names(); | |||
String[] requiredNames = Arrays.copyOfRange(partitionNames, 0, ident.numFields()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we fail if the given ident
is only a prefix? partitionExists
sounds like it should be applied to a certain partition.
BTW which command needs it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses in tests a lot, and in:
spark/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryAtomicPartitionTable.scala
Line 61 in 60fa8e3
if (idents.exists(partitionExists)) { spark/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryAtomicPartitionTable.scala
Line 71 in 60fa8e3
if (!idents.forall(partitionExists)) { Line 40 in 1eb236b
partSpecs.partition(p => table.partitionExists(p.spec)) Line 38 in 1eb236b
partSpecs.map(_.spec).partition(table.partitionExists)
I guess, we can rewrite the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we always call it with a complete partition spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway it's not related to this PR.
thanks, merging to master! |
Are there missing codes? It seems that the latest code can't be compiled @cloud-fan @MaxGekk |
Thank you, @LuciferYang . I am about to open a PR with a fix. |
Here is the fix for the build error: #30553 |
…artitionsExec` ### What changes were proposed in this pull request? Use `listPartitionIdentifiers ` instead of `listPartitionByNames` in `ShowPartitionsExec`. The `listPartitionByNames` was renamed by #30514. ### Why are the changes needed? To fix build error. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running tests for the `SHOW PARTITIONS` command: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowPartitionsSuite" ``` Closes #30553 from MaxGekk/fix-build-show-partitions-exec. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…artitionsExec` ### What changes were proposed in this pull request? Use `listPartitionIdentifiers ` instead of `listPartitionByNames` in `ShowPartitionsExec`. The `listPartitionByNames` was renamed by apache/spark#30514. ### Why are the changes needed? To fix build error. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running tests for the `SHOW PARTITIONS` command: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowPartitionsSuite" ``` Closes #30553 from MaxGekk/fix-build-show-partitions-exec. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
listPartitionIdentifiers()
from theSupportsPartitionManagement
interface. The method lists partitions by ident prefix.listPartitionByNames()
tolistPartitionIdentifiers()
.partitionExists()
using new method.Why are the changes needed?
Getting partitions by ident prefix only is not used, and it can be removed to improve code maintenance. Also this makes the
SupportsPartitionManagement
interface cleaner.Does this PR introduce any user-facing change?
Should not.
How was this patch tested?
By running the affected test suites: