Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17166] [SQL] Store Table Properties in CTAS that is Converted to Data Source Tables #14727

Closed
wants to merge 1 commit into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

CTAS lost table properties after conversion to data source tables. For example,

CREATE TABLE t TBLPROPERTIES('prop1' = 'c', 'prop2' = 'd') AS SELECT 1 as a, 1 as b

The output of DESC FORMATTED t does not have the related properties.

|Table Parameters:           |                                                                                                              |       |
|  rawDataSize               |-1                                                                                                            |       |
|  numFiles                  |1                                                                                                             |       |
|  transient_lastDdlTime     |1471670983                                                                                                    |       |
|  totalSize                 |496                                                                                                           |       |
|  spark.sql.sources.provider|parquet                                                                                                       |       |
|  EXTERNAL                  |FALSE                                                                                                         |       |
|  COLUMN_STATS_ACCURATE     |false                                                                                                         |       |
|  numRows                   |-1                                                                                                            |       |
|                            |                                                                                                              |       |
|# Storage Information       |                                                                                                              |       |
|SerDe Library:              |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe                                                   |       |
|InputFormat:                |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat                                                 |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat                                                |       |
|Compressed:                 |No                                                                                                            |       |
|Storage Desc Parameters:    |                                                                                                              |       |
|  serialization.format      |1                                                                                                             |       |
|  path                      |file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-f3aa2927-6464-4a35-a715-1300dde6c614/t|       |

After the fix, the properties specified by users are stored as serde properties, since the table properties are used for storing table schemas and system generated properties.

|Table Parameters:           |                                                                                                              |       |
|  rawDataSize               |-1                                                                                                            |       |
|  numFiles                  |1                                                                                                             |       |
|  transient_lastDdlTime     |1471672182                                                                                                    |       |
|  totalSize                 |496                                                                                                           |       |
|  spark.sql.sources.provider|parquet                                                                                                       |       |
|  EXTERNAL                  |FALSE                                                                                                         |       |
|  COLUMN_STATS_ACCURATE     |false                                                                                                         |       |
|  numRows                   |-1                                                                                                            |       |
|                            |                                                                                                              |       |
|# Storage Information       |                                                                                                              |       |
|SerDe Library:              |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe                                                   |       |
|InputFormat:                |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat                                                 |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat                                                |       |
|Compressed:                 |No                                                                                                            |       |
|Storage Desc Parameters:    |                                                                                                              |       |
|  prop2                     |d                                                                                                             |       |
|  prop1                     |c                                                                                                             |       |
|  serialization.format      |1                                                                                                             |       |
|  path                      |file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-78c38cea-02c9-40aa-9b20-9803686069ae/t|       |
+----------------------------+--------------------------------------------------------------------------------------------------------------+-------+

How was this patch tested?

Added a test case.

@gatorsmile gatorsmile changed the title [SPARK-17166] [SQL] Store Table Properties Specified in CTAS after Conversion to Data Source Tables [SPARK-17166] [SQL] Store Table Properties in CTAS that is Converted to Data Source Tables Aug 20, 2016
@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai This is what we discussed in another PR. Could you please review whether this is a right fix? Thanks!

@SparkQA
Copy link

SparkQA commented Aug 20, 2016

Test build #64125 has finished for PR 14727 at commit bffc412.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(tableDesc.properties.get("prop1").isEmpty)
assert(tableDesc.properties.get("prop2").isEmpty)
assert(tableDesc.storage.properties.get("prop1") == Option("c"))
assert(tableDesc.storage.properties.get("prop2") == Option("d"))
Copy link
Contributor

@cloud-fan cloud-fan Aug 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we want? Why do the table properties in Hive serde table should go to storage properties in data source table?

Ideally data source table should have data source options(storage properties) and table properties. Currently we don't support specifying table properties for data source tables, but it doesn't mean we will never do it. I think we can do it when unify the CREATE TABLE syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, agree! Let me close this PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants