-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug](s3) fix S3 file system gets absolute path #44965
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
@@ -121,7 +121,7 @@ class S3FileSystem final : public RemoteFileSystem { | |||
abs_path = path; | |||
} else { | |||
// path with no schema | |||
abs_path = _root_path / path; | |||
abs_path = _prefix / path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_prefix is normalized in constructor, it is removed the first and last '/'
run buildall |
TPC-H: Total hot run time: 40315 ms
|
TPC-DS: Total hot run time: 198303 ms
|
ClickBench: Total hot run time: 33.03 s
|
TeamCity be ut coverage result: |
run cloud_p0 |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT
### What problem does this PR solve? Issue Number: close #44902 Related PR: #xxx ### Problem Summary If we create a S3 resource for cooldown data storage and the s3.root.path has a leading slash(/), such as: /root, the remote S3 data file will not be deleted if we drop the cooldown tablet from Doris. ### Reason Doris will get the path of all files of a tablet and then delete those files, The AWS `S3Client:ListObjectsV2` uses objects prefix to get the object files, but if the prefix has a leading slash(/), the`S3Client:ListObjectsV2` gets empty result. ### Solution use _prefix instead of _root_path in S3FileSystem for getting absolute path, _prefix is normalized in constructor and it is removed the first and last '/'. ### test case 1. create a s3 resource with a leading '/' and a storage policy base on it: ``` CREATE RESOURCE "test_resource" PROPERTIES ( "type" = "s3", "s3.endpoint" = "xxx", "s3.region" = "xxx", "s3.bucket" = "xxx", "s3.root.path" = "/tmp", "s3.access_key" = "xx", "s3.secret_key" = "xx" ); CREATE STORAGE POLICY test_policy PROPERTIES ( "storage_resource" = "test_resource", "cooldown_ttl" = "1" ) ``` 2. create a table and set the storage_policy to `test_policy` and insert test data: ``` CREATE TABLE `test_table` ( `k` bigint, `v` bigint ) ENGINE=OLAP DUPLICATE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 PROPERTIES ( "replication_num" = "1", "storage_policy" = "test_policy" ); insert into test_table values (1,2),(2,3); ``` 3. wait for the rowset cooldown to s3. 4. truncate test table ``` truncate table test_table force; ``` 5. If we get a log like ` delete remote rowsets of tablet. root_path="/tmp", tablet_id=xxx` in be.INFO, the remote tablet file should be deleted. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [x] Regression test - [x] Unit Test - [x] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [x] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [x] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
Issue Number: close #44902
Related PR: #xxx
Problem Summary
If we create a S3 resource for cooldown data storage and the s3.root.path has a leading slash(/), such as: /root, the remote S3 data file will not be deleted if we drop the cooldown tablet from Doris.
Reason
Doris will get the path of all files of a tablet and then delete those files, The AWS
S3Client:ListObjectsV2
uses objects prefix to get the object files, but if the prefix has a leading slash(/), theS3Client:ListObjectsV2
gets empty result.Solution
use _prefix instead of _root_path in S3FileSystem for getting absolute path, _prefix is normalized in constructor and it is removed the first and last '/'.
test case
test_policy
and insert test data:delete remote rowsets of tablet. root_path="/tmp", tablet_id=xxx
in be.INFO, the remote tablet file should be deleted.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)