-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix correctness issue with double slash and add test for S3 and Glue metastore #17804
Conversation
a4c6287
to
39ece22
Compare
39ece22
to
2da154d
Compare
1a9eab5
to
a317fd8
Compare
a317fd8
to
76a7baf
Compare
@@ -562,7 +563,7 @@ public synchronized void createTable( | |||
ConnectorSession session, | |||
Table table, | |||
PrincipalPrivileges principalPrivileges, | |||
Optional<Path> currentPath, | |||
Optional<Location> currentPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern about other usage of org.apache.hadoop.fs.Path
, but hesitated to migrate everything in a PR. Let me know if I should migrate more in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably cannot solve all these problems at once.
we should revert 8bd9f75 to unblock the 420 release and then re-commit it behind a config toggle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to maintain compatibility with Hive HMS (or Athena), then per @hashhar 's comment #17923 (review) we may need to keep normalization for ever. May require further consideration though
@@ -630,10 +639,10 @@ private static boolean isManagedTable(Table table) | |||
return table.getTableType().equals(MANAGED_TABLE.name()); | |||
} | |||
|
|||
private static void deleteDir(HdfsContext context, HdfsEnvironment hdfsEnvironment, Path path, boolean recursive) | |||
private void deleteDir(HdfsContext context, Location path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes minor behavior change. For instance, we can't delete the directory having #
anymore. I suppose this is acceptable and we should deny creating such a database or table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this has no real impact since for example if partition columns include #
(in their name or value) the directory name contains it in URL encoded form and not raw #
.
76a7baf
to
95e7f04
Compare
95e7f04
to
60ecb5a
Compare
Removed RELEASE-BLOCKER tag since issue is already tagged as such. |
60ecb5a
to
0f7bc2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that #17848 goes 1st, this submission LGTM.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveCreateExternalTable.java
Show resolved
Hide resolved
import static org.assertj.core.api.Assertions.assertThat; | ||
import static org.assertj.core.api.Assertions.assertThatThrownBy; | ||
|
||
public class TestHiveS3AndGlueMetastoreTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all these tests pass without your following commit - specifically the tests dealing with double slash within the schema location for Hive fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd commit doesn't run double slash test because the location is filtered in TestHiveS3AndGlueMetastoreTest.locationPatterns
. Am I missing something?
I'm looking through the productive code of Trino OSS and see that the constructor One such usage on which I've stumbled is:
This should probably be modified. In the test (even though currently 🟢 )
There are as @ebyhr pointed out a lot of other usages which, in special constellations, will probably turn out to be other bugs related to exotic paths. |
49cac1e
to
9d79c36
Compare
@ebyhr please see test failures |
CI failures are unrelated to this change in my opinion. I will file issues.
|
d521c36
to
488e5c1
Compare
Updated |
Extracted testing commit into #17930 |
488e5c1
to
fececf7
Compare
Rebased on master to resolve conflicts. |
https://github.com/trinodb/trino/actions/runs/5308675667/jobs/9608455959?pr=17804 |
fececf7
to
8cb4bd1
Compare
The failure should be fixed by the new push. |
@Test | ||
public void testCreateTableWithDoubleSlash() | ||
{ | ||
String schemaName = "test_create_table_with_double_slash_" + randomNameSuffix(); | ||
String schemaLocation = "s3://%s/%s/double_slash//test_schema".formatted(bucketName, schemaName); | ||
String tableName = "test_create_table_with_double_slash_" + randomNameSuffix(); | ||
String schemaTableName = schemaName + "." + tableName; | ||
|
||
// Previously, HiveLocationService replaced double slash with single slash | ||
assertUpdate("CREATE SCHEMA " + schemaName + " WITH (location = '" + schemaLocation + "')"); | ||
String existingKey = "%s/double_slash/test_schema/%s".formatted(schemaName, tableName); | ||
s3.putObject(bucketName, existingKey, "test content"); | ||
|
||
assertUpdate("CREATE TABLE " + schemaTableName + "(col_int int)"); | ||
assertUpdate("INSERT INTO " + schemaTableName + " VALUES 1", 1); | ||
assertQuery("SELECT * FROM " + schemaTableName, "VALUES 1"); | ||
assertUpdate("DROP TABLE " + schemaTableName); | ||
s3.deleteObject(bucketName, existingKey); | ||
} | ||
|
||
@Test | ||
public void testCtasWithDoubleSlash() | ||
{ | ||
String schemaName = "test_ctas_with_double_slash_" + randomNameSuffix(); | ||
String schemaLocation = "s3://%s/%s/double_slash//test_schema".formatted(bucketName, schemaName); | ||
String tableName = "test_create_table_with_double_slash_" + randomNameSuffix(); | ||
String schemaTableName = schemaName + "." + tableName; | ||
|
||
// Previously, HiveLocationService replaced double slash with single slash | ||
assertUpdate("CREATE SCHEMA " + schemaName + " WITH (location = '" + schemaLocation + "')"); | ||
String existingKey = "%s/double_slash/test_schema/%s".formatted(schemaName, tableName); | ||
s3.putObject(bucketName, existingKey, "test content"); | ||
|
||
assertUpdate("CREATE TABLE " + schemaTableName + " AS SELECT 1 AS col_int", 1); | ||
assertQuery("SELECT * FROM " + schemaTableName, "VALUES 1"); | ||
assertUpdate("DROP TABLE " + schemaTableName); | ||
s3.deleteObject(bucketName, existingKey); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these new tests needed?
When working on #17958 i was under impression that other existing tests (those with data provider) cover these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are needed for changes in HiveLocationService
. Existing tests don't cover this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testBasicOperationsWithProvidedSchemaLocation
exercises HiveLocationService
is it missing more coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testBasicOperationsWithProvidedSchemaLocation
doesn't cover the test when the normalized table location already exists in S3. This test puts an object on single-slash table location and then creates a table on double-slash location. HiveLocationService
threw an incorrect already-exists error because the class internally replaced double-slash with single-slash.
It wasn't a silent correctness issue, so no need to fix it in this release, but it would be nice to add the test and improve it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
let's have explanatory comment in the test. it seems at least for me understanding its importance wasn't easy
given that issue is closed, can you please remind me which portions of this PR we still want to go ahead with? |
Superseded by #17958 |
Description
Fixes #17803
Release notes