-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16397][SQL] make CatalogTable more general and less hive specific #14071
Conversation
Test build #61858 has finished for PR 14071 at commit
|
(ctx.fileFormat, ctx.storageHandler) match { | ||
// Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format | ||
case (c: TableFileFormatContext, null) => | ||
visitTableFileFormat(c) | ||
Option(string(c.inFmt)).foreach(inFmt => storageProps += "inputFormat" -> inFmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storageProps += "inputFormat" -> string(c.inFmt)
? string(c.inFmt)
throws an NPE if it is null; so option is redundant...
Test build #61914 has finished for PR 14071 at commit
|
@@ -403,17 +400,18 @@ object CreateDataSourceTableUtils extends Logging { | |||
assert(partitionColumns.isEmpty) | |||
assert(relation.partitionSchema.isEmpty) | |||
|
|||
var storage = CatalogStorageFormat( | |||
locationUri = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this locationUri
is set to None
? It sounds like the original value is Some(relation.location.paths.map(_.toUri.toString).head
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh it's a mistake...
Test build #61951 has finished for PR 14071 at commit
|
Test build #61959 has finished for PR 14071 at commit
|
inputFormat: Option[String], | ||
outputFormat: Option[String], | ||
serde: Option[String], | ||
compressed: Boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never persist this flag into external catalog.
Test build #62247 has finished for PR 14071 at commit
|
Test build #62249 has finished for PR 14071 at commit
|
I'm closing it as it turns out that we still need these hive specific stuff in CatalogStorageFormat. Hiding them in properties seems don't have much benefit, what we need is just adding a 'provider' field in CatalogTable, which is done in #14155 |
What changes were proposed in this pull request?
CatalogTable
is an internal interface to represent table metadata in Spark SQL, and should be able to express both hive serde table and data source table. Due to some hive hacks, data source table only use table properties to store its information, and theCatalogStorageFormat
is very hive specific as it's only used by hive serde table currently.However this is not a good design, the
CatalogTable
is not general enough to express data source table if we remove hive hacks in the future.This PR is trying to make
CatalogStorageFormat
general enough to handle all types of tables in Spark SQL.How was this patch tested?
existing tests.