Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating Hive tables with double slashed location #17958

Merged
merged 1 commit into from
Jun 20, 2023

Conversation

findepi
Copy link
Member

@findepi findepi commented Jun 19, 2023

Hive connector does not support table locations containing double slashes. On S3 this leads to correctness issues (e.g. INSERT works, but SELECT does not find any data).

This commit

  • restores normalization of implicit table location during CREATE TABLE. There used to be such normalization until 8bd9f75.
  • rejects explicit table locations containing double slash during CREATE TABLE .. WITH (external_location = ...). Before 8bd9f75 there used to be normalization also during this flow, but rejecting such unsupported locations is deemed more correct.

Alternative to #17947
Stop gap solution until we fix all problems related to double slashes in Hive connector (#17803).

After merging this, #17803 is no longer a release-blocker.

@cla-bot cla-bot bot added the cla-signed label Jun 19, 2023
@findepi findepi force-pushed the findepi/hive-restore-normalization branch from 69692fc to d0e7c99 Compare June 19, 2023 15:30
Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @findinpath @alexjo2144 PTAL.

@github-actions github-actions bot added hive Hive connector tests:hive labels Jun 19, 2023
return Location.of(location).appendPath(escapeTableName(tableName));
// Note: this results in a "normalized location", e.g. not containing double slashes.
// TODO (https://github.com/trinodb/trino/issues/17803): We need to use normalized location until all relevant Hive connector components are migrated off Hadoop Path.
return Location.of(databasePath.toString()).appendPath(escapeTableName(tableName));
Copy link
Contributor

@findinpath findinpath Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trino> create schema hive.doubleslashes with (location='s3://trino-test/double_slash//location');
CREATE SCHEMA
trino> show create schema hive.doubleslashes;
                         Create Schema                          
----------------------------------------------------------------
 CREATE SCHEMA hive.doubleslashes                               
 WITH (                                                         
    location = 's3://trino-test/double_slash//location' 
 )                                                              
(1 row)

trino> create table hive.doubleslashes.test1 (c integer);
CREATE TABLE
trino> insert into hive.doubleslashes.test1 values 1,2,3;
INSERT: 3 rows
sele
Query 20230620_060740_00002_8c4bz, FINISHED, 1 node
http://localhost:8080/ui/query.html?20230620_060740_00002_8c4bz
Splits: 14 total, 14 done (100.00%)
CPU Time: 0.3s total,     0 rows/s,     0B/s, 28% active
Per Node: 0.1 parallelism,     0 rows/s,     0B/s
Parallelism: 0.1
Peak Memory: 38.2KB
3.11 [0 rows, 0B] [0 rows/s, 0B/s]

trino> select "$path"  from hive.doubleslashes.test1;
                                                        $path                                                         
----------------------------------------------------------------------------------------------------------------------
 s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97 
 s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97 
 s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97 
(3 rows)

Note that this strategy may eventually cause issues with managed tables which get created into schemas having a double-slashed location.

schema location: s3://trino-test/double_slash//location
table location: s3://trino-test/double_slash/location/test1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema location: s3://trino-test/double_slash//location
table location: s3://trino-test/double_slash/location/test1

well, that's the whole point of this PR

thanks for calling this out!

@findepi findepi force-pushed the findepi/hive-restore-normalization branch from d0e7c99 to b509013 Compare June 20, 2023 08:24
@github-actions github-actions bot added delta-lake Delta Lake connector iceberg Iceberg connector labels Jun 20, 2023
Comment on lines 1286 to 1293
// TODO (https://github.com/trinodb/trino/issues/17803) We cannot accept locations with double slash until all relevant Hive connector components are migrated off Hadoop Path.
// Hadoop Path "normalizes location", e.g.:
// - removes double slashes (such locations are rejected),
// - removes trailing slash (here ignored; foo/bar and foo/bar/ are treated as equivalent, and rejecting locations with trailing slash could pose UX issues)
// - replaces file:///<local-path> with file:/<local-path> (here ignored).
if (validated.path().replaceFirst("/+$", "").contains("//")) {
throw new TrinoException(INVALID_TABLE_PROPERTY, "Unsupported location that cannot be internally represented: " + location);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote the logic to accept locations such as file:///tmp/6383092576615971655, file:///tmp/6383092576615971655/, s3://bucket/foo/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as found in #17964 , two trailing slashes are not OK either.
i will not add a test case for that, but will block such locations.
test case coming in the other PR

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for CI failure:

Error:  Failures: 
Error:    TestHiveS3AndGlueMetastoreTest.testBasicOperationsWithProvidedTableLocation:137 
expected: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_kjxzwqj9yr"
 but was: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_kjxzwqj9yr/"
Error:    TestHiveS3AndGlueMetastoreTest.testBasicOperationsWithProvidedTableLocation:137 
expected: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_tuoqpbqkww"
 but was: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_tuoqpbqkww/"

Hive connector does not support table locations containing double
slashes. On S3 this leads to correctness issues (e.g. INSERT works, but
SELECT does not find any data).

This commit

- restores normalization of implicit table location during CREATE TABLE.
  There used to be such normalization until 8bd9f75.
- rejects explicit table locations containing double slash during
  `CREATE TABLE .. WITH (external_location = ...)`.
  Before 8bd9f75 there used to be
  normalization also during this flow, but rejecting such unsupported
  locations is deemed more correct.
@findepi findepi force-pushed the findepi/hive-restore-normalization branch from 38a2eda to 9151dc8 Compare June 20, 2023 14:09
@findepi findepi merged commit 431f3e0 into master Jun 20, 2023
@findepi findepi deleted the findepi/hive-restore-normalization branch June 20, 2023 21:23
@github-actions github-actions bot added this to the 420 milestone Jun 20, 2023
@colebow
Copy link
Member

colebow commented Jun 21, 2023

Does this need release notes? @findepi

@findepi
Copy link
Member Author

findepi commented Jun 22, 2023

yes. it fixes #17803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

5 participants