-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid creating Hive tables with double slashed location #17958
Conversation
69692fc
to
d0e7c99
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/HiveWriteUtils.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveS3AndGlueMetastoreTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveS3AndGlueMetastoreTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveS3AndGlueMetastoreTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @findinpath @alexjo2144 PTAL.
return Location.of(location).appendPath(escapeTableName(tableName)); | ||
// Note: this results in a "normalized location", e.g. not containing double slashes. | ||
// TODO (https://github.com/trinodb/trino/issues/17803): We need to use normalized location until all relevant Hive connector components are migrated off Hadoop Path. | ||
return Location.of(databasePath.toString()).appendPath(escapeTableName(tableName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trino> create schema hive.doubleslashes with (location='s3://trino-test/double_slash//location');
CREATE SCHEMA
trino> show create schema hive.doubleslashes;
Create Schema
----------------------------------------------------------------
CREATE SCHEMA hive.doubleslashes
WITH (
location = 's3://trino-test/double_slash//location'
)
(1 row)
trino> create table hive.doubleslashes.test1 (c integer);
CREATE TABLE
trino> insert into hive.doubleslashes.test1 values 1,2,3;
INSERT: 3 rows
sele
Query 20230620_060740_00002_8c4bz, FINISHED, 1 node
http://localhost:8080/ui/query.html?20230620_060740_00002_8c4bz
Splits: 14 total, 14 done (100.00%)
CPU Time: 0.3s total, 0 rows/s, 0B/s, 28% active
Per Node: 0.1 parallelism, 0 rows/s, 0B/s
Parallelism: 0.1
Peak Memory: 38.2KB
3.11 [0 rows, 0B] [0 rows/s, 0B/s]
trino> select "$path" from hive.doubleslashes.test1;
$path
----------------------------------------------------------------------------------------------------------------------
s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97
s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97
s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97
(3 rows)
Note that this strategy may eventually cause issues with managed tables which get created into schemas having a double-slashed location.
schema location: s3://trino-test/double_slash//location
table location: s3://trino-test/double_slash/location/test1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema location:
s3://trino-test/double_slash//location
table location:s3://trino-test/double_slash/location/test1
well, that's the whole point of this PR
thanks for calling this out!
d0e7c99
to
b509013
Compare
// TODO (https://github.com/trinodb/trino/issues/17803) We cannot accept locations with double slash until all relevant Hive connector components are migrated off Hadoop Path. | ||
// Hadoop Path "normalizes location", e.g.: | ||
// - removes double slashes (such locations are rejected), | ||
// - removes trailing slash (here ignored; foo/bar and foo/bar/ are treated as equivalent, and rejecting locations with trailing slash could pose UX issues) | ||
// - replaces file:///<local-path> with file:/<local-path> (here ignored). | ||
if (validated.path().replaceFirst("/+$", "").contains("//")) { | ||
throw new TrinoException(INVALID_TABLE_PROPERTY, "Unsupported location that cannot be internally represented: " + location); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote the logic to accept locations such as file:///tmp/6383092576615971655
, file:///tmp/6383092576615971655/
, s3://bucket/foo/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as found in #17964 , two trailing slashes are not OK either.
i will not add a test case for that, but will block such locations.
test case coming in the other PR
b509013
to
38a2eda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for CI failure:
Error: Failures:
Error: TestHiveS3AndGlueMetastoreTest.testBasicOperationsWithProvidedTableLocation:137
expected: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_kjxzwqj9yr"
but was: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_kjxzwqj9yr/"
Error: TestHiveS3AndGlueMetastoreTest.testBasicOperationsWithProvidedTableLocation:137
expected: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_tuoqpbqkww"
but was: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_tuoqpbqkww/"
Hive connector does not support table locations containing double slashes. On S3 this leads to correctness issues (e.g. INSERT works, but SELECT does not find any data). This commit - restores normalization of implicit table location during CREATE TABLE. There used to be such normalization until 8bd9f75. - rejects explicit table locations containing double slash during `CREATE TABLE .. WITH (external_location = ...)`. Before 8bd9f75 there used to be normalization also during this flow, but rejecting such unsupported locations is deemed more correct.
38a2eda
to
9151dc8
Compare
Does this need release notes? @findepi |
yes. it fixes #17803 |
Hive connector does not support table locations containing double slashes. On S3 this leads to correctness issues (e.g. INSERT works, but SELECT does not find any data).
This commit
CREATE TABLE .. WITH (external_location = ...)
. Before 8bd9f75 there used to be normalization also during this flow, but rejecting such unsupported locations is deemed more correct.Alternative to #17947
Stop gap solution until we fix all problems related to double slashes in Hive connector (#17803).
After merging this, #17803 is no longer a
release-blocker
.