Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS support for DataSourceV2 #21877

Closed
wants to merge 3 commits into from

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 25, 2018

What changes were proposed in this pull request?

  • Remove extends from ReadSupport and WriteSupport classes for use with Table
  • Add CTAS and RTAS logical plans
  • Refactor physical write plans so AppendData, CTAS, and RTAS use the same base class
  • Add support for TableCatalog to DataFrameReader and DataFrameWriter
  • Add TableV2Relation for tables that are loaded by TableCatalog and have no DataSource instance
  • Move implicit helpers into DataSourceV2Implicits to avoid future churn

Note that this doesn't handle partitionBy in DataFrameWriter. Adding support for partitioned tables will require validation rules.

This is based on unmerged work and includes the commits from #21306 and #21305.

How was this patch tested?

Adding unit tests for CTAS and RTAS.

@rdblue
Copy link
Contributor Author

rdblue commented Jul 25, 2018

@cloud-fan, @gatorsmile, @marmbrus, this PR demonstrates how plans would use the catalog changes introduced in #21306. To see the changes, you may want to look at just the last commit because this includes changes from other PRs.

@SparkQA
Copy link

SparkQA commented Jul 25, 2018

Test build #93572 has finished for PR 21877 at commit d308d3c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract case class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93613 has finished for PR 21877 at commit 5dcf159.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@rdblue rdblue force-pushed the add-ctas-rtas-v2-plans branch 2 times, most recently from 48c9998 to 8709957 Compare July 26, 2018 18:28
@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93614 has finished for PR 21877 at commit 323479c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93615 has finished for PR 21877 at commit 48c9998.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93616 has finished for PR 21877 at commit 8709957.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract case class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93618 has finished for PR 21877 at commit 65e42b9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract case class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

@SparkQA
Copy link

SparkQA commented Jul 26, 2018

Test build #93620 has finished for PR 21877 at commit 37b981b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

@SparkQA
Copy link

SparkQA commented Jul 27, 2018

Test build #93638 has finished for PR 21877 at commit b6b29d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

Expression is internal and should not be used in public APIs. To avoid
using Expression in the TableCatalog API, this commit adds a small set
of transformations that are used to communicate partitioning to catalog
implementations.

This also adds an apply transformation that passes the name of a
transform instead of a Transform class. This can be used to pass
transforms that are unknown to Spark to the underlying catalog
implementation.
This uses the catalog API introduced in SPARK-24252 to implement CTAS
and RTAS plans.
@SparkQA
Copy link

SparkQA commented Aug 15, 2018

Test build #94828 has finished for PR 21877 at commit e50d94b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

@SparkQA
Copy link

SparkQA commented Dec 10, 2018

Test build #99893 has finished for PR 21877 at commit e50d94b.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

@SparkQA
Copy link

SparkQA commented Jan 11, 2019

Test build #101100 has finished for PR 21877 at commit e50d94b.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableAsSelect(
  • case class ReplaceTableAsSelect(
  • case class TableV2Relation(
  • case class AppendDataExec(
  • case class CreateTableAsSelectExec(
  • case class ReplaceTableAsSelectExec(
  • case class WriteToDataSourceV2Exec(
  • abstract class V2TableWriteExec(
  • implicit class CatalogHelper(catalog: CatalogProvider)
  • implicit class TableHelper(table: Table)
  • implicit class SourceHelper(source: DataSourceV2)
  • implicit class OptionsHelper(options: Map[String, String])

@github-actions
Copy link

github-actions bot commented Jan 9, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 9, 2020
@github-actions github-actions bot closed this Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants