-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avro.schema.literal support and some related patches #5101
Conversation
@lxynov, you have anything to add to this PR? Thanks. |
eabdc5e
to
75400d3
Compare
@findepi Could you help review this? Some commits seems hacky (due to Hive's lack of spec and doc). Feel free to slack me if you'd like to discuss in DMs. |
if (isAvroTableWithSchemaSet(table)) { | ||
// If a partition's Serde doesn't use metastore for schema (like Avro), we use table-level schema. Data publishers | ||
// are responsible for making sure that table-level schema is workable for all partitions. | ||
if (!isPartitionSerdesUsingMetastoreForSchema(partition)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test (product test?) for partitioned tables where table and partition schema mismatches?
for example
- table is ORC, partition is Parquet
- table is ORC, partition is Avro with schema url
- table is ORC, partition is Avro with schema literal
- table is ORC, partition is CSV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to add such a product test (using HQLs) which fails before the commit and succeeds after the commit, but failed to do so. But I think this commit logically makes sense and it has been deployed at our company for a long time. Please let me know if you have concerns.
presto-hive/src/main/java/io/prestosql/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
new Schema.Parser().parse(avroSchemaLiteral); | ||
return avroSchemaLiteral; | ||
} | ||
catch (SchemaParseException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be any other exception flying here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the implementation of Schema.Parser::parse()
, I think SchemaParseException
is sufficient here
if (getAvroSchemaUrl(tableMetadata.getProperties()) != null) { | ||
throw new PrestoException(NOT_SUPPORTED, "CREATE TABLE AS not supported when Avro schema url is set"); | ||
if (getAvroSchemaUrl(tableMetadata.getProperties()) != null || getAvroSchemaLiteral(tableMetadata.getProperties()) != null) { | ||
throw new PrestoException(NOT_SUPPORTED, "CREATE TABLE AS not supported when Avro schema url or literal is set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason not to support CTAS with schema url is because we need the table to be in metastore in order to get the information what the schema actually (at least this is the current implementation).
For schema literal we could obtain this information locally. Even if we do not do this (which I am fine with), we should keep those cases separate here -- as separate if
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated if
s
the table/partition schema handliung changes require some improvement |
c68ea1c
to
96ce3ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add test cases about schema evolution (maybe in TestAvroSchemaEvolution)? This PR is going to close #5001, but the test looks not covered.
@Test(groups = AVRO) | ||
public void testHiveCreatedTable() | ||
{ | ||
onHive().executeQuery("DROP TABLE IF EXISTS test_avro_schema_literal_hive"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to limit onHive()
usage for Hive specific operation because it's slow. Other places are also the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is meant to test Hive created tables, so onHive()
is used here.
presto-product-tests/src/main/java/io/prestosql/tests/hive/TestAvroSchemaLiteral.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/MetastoreUtil.java
Outdated
Show resolved
Hide resolved
c175748
to
7c82037
Compare
@ebyhr All tests have passed. This is ready for review |
@ebyhr @findepi Could you please take a look? Thanks in advance!
@findepi I thought about this before but found that commit |
👋 @lxynov - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
Closes #5001
A related issue: prestodb/presto#9116