-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19681: Fix S3A failing to initialize S3 buckets having namespace with dot followed by number #7942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Test |
@steveloughran : Could you please review the changes. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this complicates things. I've been happily treating all issues related to buckets with . in them as WONTFIX. Storediag tells people off too.
- make sure that FileSystem.get(URI, config) always returns the same instance from cache (and a different one from that with a a .2 as a suffix)
- could this be moved to a unit test, rather than an ITest?
- you will need to add something in the documentation, especially about path style access -and warn that support is not guaranteed across other versions/applications.
Configuration config = new Configuration(); | ||
Path path = new Path("s3a://test-bucket-v1.1"); | ||
try (FileSystem fs = path.getFileSystem(config)) { | ||
assertThat(fs instanceof S3AFileSystem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use whatever assertj assertion is about instanceof, so you get a better error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@steveloughran, in HADOOP-17241, you referenced this announcement as one of the reasons to not support this buckets with dot in the name: https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/ However, AWS have since revised their stance. AWS has confirmed they will continue supporting buckets with dots in their names through virtual hosted-style URLs due to customer feedback and compatibility requirements.
So I guess it makes sense to add support for it in S3A. |
…ce with dot followed by number
ea1308a
to
5014c32
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Thanks @steveloughran for the review. I have addressed your comments. |
OK. main issue is that there may be existing code which cares about hostname and may not look at FQDN for differencing names. I can't think of any right now -bucket names which don't map to valid hostnames are more a pain point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens with per bucket settings here.
for a bucket stevel.1, will I be able to have
fs.s3a.bucket.stevel.1.path.style.access = true
furthermore, will my bucket "stevel" now have the option `1.path.style.access" set?
this was never something we considered when we did that
port = (port == -1 ? defaultPort : port); | ||
if (port == -1) { // no port supplied and default port is not specified | ||
return new URI(supportedScheme, authority, "/", null); | ||
return URI.create(supportedScheme + "://" + authority + "/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
// get the host; this is guaranteed to be non-null, non-empty | ||
// get the host; fallback to authority if getHost() returns null | ||
bucket = name.getHost(); | ||
if (bucket == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull this out, stick it in S3AUtils
, add unit tests that now try to break things. Use everywhere
This release adds support for S3 bucket names containing dots followed by numbers | ||
(e.g., `my-bucket-v1.1`, `data-store.v2.3`). Previous versions of the Hadoop S3A | ||
client failed to initialize such buckets due to URI parsing limitations. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- highlight that per-bucket settings do not work for dotted buckets (they don't, do they?), so the ability to use them is still very much downgraded.
- Explain that AWS do not recommend dotted buckets for anything other than web site serving
- highlight that path style access is needed to access (correct? never tried)
Description of PR
S3A fails to initialize when S3 bucket namespace is having dot followed by a number.
Specific Problem: URI parsing fails when S3 bucket names contain a dot followed by a number (like bucket-v1.1-us-east-1). Java's
URI.getHost() method incorrectly interprets the dot-number pattern as a port specification, causing it to return null.
How was this patch tested?
Tested in us-east-1 with bucket having namespace with dot followed by a number.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?