Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16397][SQL] make CatalogTable more general and less hive specific #14071

Closed
wants to merge 3 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

CatalogTable is an internal interface to represent table metadata in Spark SQL, and should be able to express both hive serde table and data source table. Due to some hive hacks, data source table only use table properties to store its information, and the CatalogStorageFormat is very hive specific as it's only used by hive serde table currently.

However this is not a good design, the CatalogTable is not general enough to express data source table if we remove hive hacks in the future.

This PR is trying to make CatalogStorageFormat general enough to handle all types of tables in Spark SQL.

How was this patch tested?

existing tests.

@SparkQA
Copy link

SparkQA commented Jul 6, 2016

Test build #61858 has finished for PR 14071 at commit 4ef7dd0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

(ctx.fileFormat, ctx.storageHandler) match {
// Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format
case (c: TableFileFormatContext, null) =>
visitTableFileFormat(c)
Option(string(c.inFmt)).foreach(inFmt => storageProps += "inputFormat" -> inFmt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageProps += "inputFormat" -> string(c.inFmt)? string(c.inFmt) throws an NPE if it is null; so option is redundant...

@cloud-fan cloud-fan changed the title [SPARK-16397][SQL][WIP] make CatalogTable more general and less hive specific [SPARK-16397][SQL] make CatalogTable more general and less hive specific Jul 7, 2016
@cloud-fan
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Jul 7, 2016

Test build #61914 has finished for PR 14071 at commit 4d65609.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -403,17 +400,18 @@ object CreateDataSourceTableUtils extends Logging {
assert(partitionColumns.isEmpty)
assert(relation.partitionSchema.isEmpty)

var storage = CatalogStorageFormat(
locationUri = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this locationUri is set to None? It sounds like the original value is Some(relation.location.paths.map(_.toUri.toString).head

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh it's a mistake...

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61951 has finished for PR 14071 at commit 9efe46e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61959 has finished for PR 14071 at commit b218bb7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

inputFormat: Option[String],
outputFormat: Option[String],
serde: Option[String],
compressed: Boolean,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never persist this flag into external catalog.

@SparkQA
Copy link

SparkQA commented Jul 13, 2016

Test build #62247 has finished for PR 14071 at commit 632d9a7.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2016

Test build #62249 has finished for PR 14071 at commit d18ac59.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

I'm closing it as it turns out that we still need these hive specific stuff in CatalogStorageFormat. Hiding them in properties seems don't have much benefit, what we need is just adding a 'provider' field in CatalogTable, which is done in #14155

@cloud-fan cloud-fan closed this Jul 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants