-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6024][SQL] When a data source table has too many columns, it's schema cannot be stored in metastore. #4795
Conversation
Test build #28020 has started for PR 4795 at commit
|
Test build #28022 has started for PR 4795 at commit
|
Test build #28020 has finished for PR 4795 at commit
|
Test PASSed. |
tbl.setProperty("spark.sql.sources.schema.numOfParts", "1") | ||
// We use spark.sql.sources.schema instead of using spark.sql.sources.schema.part.0 | ||
// because users may have already created data source tables in metastore. | ||
tbl.setProperty("spark.sql.sources.schema", schemaJsonString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just always use schema.part.0 ? seems easier to consolidate the two code path
Test build #28022 has finished for PR 4795 at commit
|
Test PASSed. |
Test build #28025 has started for PR 4795 at commit
|
Test build #28025 has finished for PR 4795 at commit
|
Test PASSed. |
@@ -69,13 +69,19 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with | |||
val table = synchronized { | |||
client.getTable(in.database, in.name) | |||
} | |||
val schemaString = table.getProperty("spark.sql.sources.schema") | |||
val schemaString = Option(table.getProperty("spark.sql.sources.schema.numOfParts")) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more conventional to use numParts instead of numOfParts. Also you can remove the pattern matching by just applying a map.
Option(table.getProperty("spark.sql.sources.schema.numParts")).map { numParts =>
...
}
Test build #28031 has started for PR 4795 at commit
|
val part = table.getProperty(s"spark.sql.sources.schema.part.${index}") | ||
if (part == null) { | ||
throw new AnalysisException( | ||
"Could not read schema from the metastore because it is corrupted.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for being picky, but it would be great to include the reason why it is corrupted (i.e. "missing part x")
Test build #28031 has finished for PR 4795 at commit
|
Test PASSed. |
Test build #28043 has started for PR 4795 at commit
|
lgtm |
Test build #28043 has finished for PR 4795 at commit
|
Test PASSed. |
Merging in! |
… schema cannot be stored in metastore. JIRA: https://issues.apache.org/jira/browse/SPARK-6024 Author: Yin Huai <yhuai@databricks.com> Closes #4795 from yhuai/wideSchema and squashes the following commits: 4882e6f [Yin Huai] Address comments. 73e71b4 [Yin Huai] Address comments. 143927a [Yin Huai] Simplify code. cc1d472 [Yin Huai] Make the schema wider. 12bacae [Yin Huai] If the JSON string of a schema is too large, split it before storing it in metastore. e9b4f70 [Yin Huai] Failed test. (cherry picked from commit 5e5ad65) Signed-off-by: Reynold Xin <rxin@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-6024