Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presto should not rely on Hive metastore types for custom SerDes. #9116

Closed
sershe opened this issue Oct 6, 2017 · 2 comments
Closed

Presto should not rely on Hive metastore types for custom SerDes. #9116

sershe opened this issue Oct 6, 2017 · 2 comments

Comments

@sershe
Copy link

sershe commented Oct 6, 2017

Hive tables by default don't use column information from metastore to get types; they use column information provided by the deserializer. Many popular and built-in SerDes are exceptions; they are enumerated explicitly, see HiveConf.java, SERDESUSINGMETASTOREFORSCHEMA. Hive used to replicate the types that SerDe provides into metastore, however there can be consistency problems with these; until recently, for large type strings (e.g. for Avro), the metastore columns could be truncated from the original schema; it's hypothetically possible for users to change the schema by changing the contents of the external schema file for some serdes; etc., so Hive was recently changed to not store the type for such SerDes that it is itself not using.
Only the SerDes that rely on metastore for schema store the schema in metastore.

We are seeing errors from Presto that seem to indicate Presto is using the column type strings from metastore directly, at least for describe table. Given that types may be inconsistent with SerDe-provided types, they should not be used in such manner.

@sershe
Copy link
Author

sershe commented Oct 6, 2017

For reference, here's what Hive does (has.. method just checks the config setting)
if (hasMetastoreBasedSchema(...)) {
return tTable.getSd().getCols();
...
} else {
return MetaStoreUtils.getFieldsFromDeserializer(getTableName(), getDeserializer());
}

@findepi
Copy link
Contributor

findepi commented Sep 24, 2018

This should be fixed by #11289 (Use getFields to get the schema of tables commit). Please correct me if i'm wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants