Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Delta writes when schema location ends with two slashes #17964

Closed

Conversation

findepi
Copy link
Member

@findepi findepi commented Jun 20, 2023

Extracted from #17958

Fixes #17966

@cla-bot cla-bot bot added the cla-signed label Jun 20, 2023
@findepi findepi marked this pull request as draft June 20, 2023 09:01
@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector tests:hive labels Jun 20, 2023
@findepi findepi force-pushed the findepi/delta-trailing-slashes branch from fc4b1e1 to 5abf6f1 Compare June 20, 2023 14:03
TRAILING_WHITESPACE("s3://%s/%s/trailing_whitespace/%s "),
// PERCENT("s3://%s/%s/a%%percent/%s"),
// WHITESPACE("s3://%s/%s/a whitespace/%s"),
// TRAILING_WHITESPACE("s3://%s/%s/trailing_whitespace/%s "),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop

findepi added 9 commits June 20, 2023 16:07
Hive connector does not support table locations containing double
slashes. On S3 this leads to correctness issues (e.g. INSERT works, but
SELECT does not find any data).

This commit

- restores normalization of implicit table location during CREATE TABLE.
  There used to be such normalization until 8bd9f75.
- rejects explicit table locations containing double slash during
  `CREATE TABLE .. WITH (external_location = ...)`.
  Before 8bd9f75 there used to be
  normalization also during this flow, but rejecting such unsupported
  locations is deemed more correct.
Configure bucket used in `TestIcebergS3AndGlueMetastoreTest` same way as
in other `BaseS3AndGlueMetastoreTest` subclasses. Makes it easier to run
tests locally.

This also changes `TestIcebergGlueCatalogConnectorSmokeTest` for
consistency.
This may be needed when working with MinIO containers where bucket names
can be reused. This is not needed when working with real S3 where bucket
names are unique. Also, it's unclear whether this is a safe operation
when tests execute in parallel.
The `String.contains`-based inclusions/exclusions doesn't allow
distinguishing between hypothetical cases like "trailing_slash" and
"two_trailing_slashes".

This also simplifies generated test locations for schemas. Previously
the schema name was formatted twice into the location string.
Remove constructor which doesn't construct the table.
Since it was randomizing the name, it couldn't even be used for dropping
existing table.
Use try with resources to ensure that test tables are dropped even if
an assertion fails.
@findepi findepi force-pushed the findepi/delta-trailing-slashes branch from 5abf6f1 to dcf5783 Compare June 20, 2023 15:39
@findepi findepi closed this Jun 20, 2023
@findepi findepi deleted the findepi/delta-trailing-slashes branch June 20, 2023 20:03
@findepi
Copy link
Member Author

findepi commented Jun 20, 2023

resubmitted as #17980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Writes to Delta table fail when location ends with two slashes
1 participant