-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS support for DataSourceV2 #21877
Conversation
@cloud-fan, @gatorsmile, @marmbrus, this PR demonstrates how plans would use the catalog changes introduced in #21306. To see the changes, you may want to look at just the last commit because this includes changes from other PRs. |
Test build #93572 has finished for PR 21877 at commit
|
5dcf159
to
323479c
Compare
Test build #93613 has finished for PR 21877 at commit
|
48c9998
to
8709957
Compare
Test build #93614 has finished for PR 21877 at commit
|
Test build #93615 has finished for PR 21877 at commit
|
Test build #93616 has finished for PR 21877 at commit
|
8709957
to
65e42b9
Compare
Test build #93618 has finished for PR 21877 at commit
|
65e42b9
to
37b981b
Compare
Test build #93620 has finished for PR 21877 at commit
|
37b981b
to
b6b29d8
Compare
Test build #93638 has finished for PR 21877 at commit
|
Expression is internal and should not be used in public APIs. To avoid using Expression in the TableCatalog API, this commit adds a small set of transformations that are used to communicate partitioning to catalog implementations. This also adds an apply transformation that passes the name of a transform instead of a Transform class. This can be used to pass transforms that are unknown to Spark to the underlying catalog implementation.
This uses the catalog API introduced in SPARK-24252 to implement CTAS and RTAS plans.
b6b29d8
to
e50d94b
Compare
Test build #94828 has finished for PR 21877 at commit
|
Test build #99893 has finished for PR 21877 at commit
|
Test build #101100 has finished for PR 21877 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
ReadSupport
andWriteSupport
classes for use withTable
TableCatalog
toDataFrameReader
andDataFrameWriter
TableV2Relation
for tables that are loaded byTableCatalog
and have noDataSource
instanceDataSourceV2Implicits
to avoid future churnNote that this doesn't handle
partitionBy
inDataFrameWriter
. Adding support for partitioned tables will require validation rules.This is based on unmerged work and includes the commits from #21306 and #21305.
How was this patch tested?
Adding unit tests for CTAS and RTAS.