-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update AWS SDK to 1.11.340 in repository-s3 #30723
Conversation
Pinging @elastic/es-distributed |
@elasticmachine test this please |
@elasticmachine test this please |
0cff666
to
ebd60bf
Compare
cbfbb2b
to
e00fe76
Compare
This commit updates the AWS SDK for the repository-s3 plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me uncomfortable to have this derive-region-from-endpoint logic in our code. It looks like it works as far as I understand it, but this is not documented behaviour so there's a risk that it isn't right now in some obscure cases, and a larger risk that it becomes incorrect in the future. Amazon's SDKs prefer deriving things like endpoints from the region, not the other way round, and I think we should try and move towards that model.
Additionally, when using other S3-compatible services (e.g. Minio), the region can be set to an arbitrary value, and this is required if requests are to be correctly signed. At the moment it works as long as the Minio administrator leaves the region set to the default value, an empty string, but I think we cannot deal with other values here.
I think the right thing to do is to expose the region
parameter again, and possibly also enableForceGlobalBucketAccess
, so that users have more direct control over the SDK. We may need this region-from-endpoint logic for BWC reasons.
final String serviceEndpoint = Strings.hasText(clientSettings.endpoint) ? clientSettings.endpoint : Constants.S3_HOSTNAME; | ||
|
||
String signingRegion = null; | ||
if (isAwsStandardEndpoint(serviceEndpoint)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me uncomfortable - see review summary.
I do agree and brought that sometime ago with #27924 and #27925 (diff) IMO we should indeed try to follow what is recommended by the platform.
👍 on my end. But it's important that @rjernst also shares his thoughts. |
I have no real opinion on this, as I haven't worked on the repository plugins in a while. I will say that the code was unnecessarily complex and lenient when we supported both region and endpoint. There were no tests, and you could set a mix of wrong region/endpoint that would not fail until runtime (when a repository was created). IIRC, at least one was always necessary, when in fact the defaults for s3 will auto find the correct region from the bucket name (it can do this since s3 buckets have a global namespace across all regions). If both are to be supported, I strongly urge there to be strict checking, and the defaults to still work (ie not having to set anything for 99% of the cases other than bucket name). |
Thanks a lot for all your comments and I apologize again for the time it takes me to get this in. According to the SDK documentation and code samples, the expected way to create an The SDK still supports the possibility to not set the region and in this case it uses the default Amazon S3 endpoint This is the current behavior of the repository-s3 plugin since the region setting has been removed. The deprecated constructor I updated the pull request so that it does not use the deprecated ctor anymore but still keep the current behavior without relying on a dirty endpoint-to-region logic like I proposed in a previous change. When no endpoint is defined the On a short term (but outside the scope of this pull request), we will have to reintroduce the With the
I don't think we'll be able to do that if we also support Minio where region/endpoint can be anything.
We should be able to support that as long as the SDK supports it itself. @ywelsch @DaveCTurner @rjernst Gentlemen, can you please have another look to this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I think a comment about why we're setting the endpoint rather than letting the SDK work it out would be useful. See inline comment.
|
||
final String endpoint = Strings.hasLength(clientSettings.endpoint) ? clientSettings.endpoint : Constants.S3_HOSTNAME; | ||
logger.debug(() -> new ParameterizedMessage("using endpoint [{}]", endpoint)); | ||
builder.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpoint, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me some time to see what was happening here.
If the endpoint configuration isn't set on the builder then the default behaviour is to try and work out what region we are in and use an appropriate endpoint - see AwsClientBuilder#setRegion
. In contrast, directly-constructed clients use s3.amazonaws.com
unless otherwise instructed. We currently use a directly-constructed client, and need to keep the existing behaviour to avoid a breaking change, so to move to using the builder we must set it explicitly to keep the existing behaviour.
AIUI we do this because directly constructing the client is deprecated (was already deprecated in 1.1.223 too) so this change removes that usage of a deprecated API.
A short comment to this effect would be useful, I think, to save future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume, by the way, that passing null
for the region here is ok. I haven't tried it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggested comment, I'll steal it from you.
I presume, by the way, that passing null for the region here is ok. I haven't tried it.
Yes. Similarly to the deprecated setEndpoint()
method that leaves the region to null until it is later resolved from the endpoint to the default us-east-1 region.
builder.withClientConfiguration(buildConfiguration(clientSettings)); | ||
|
||
final String endpoint = Strings.hasLength(clientSettings.endpoint) ? clientSettings.endpoint : Constants.S3_HOSTNAME; | ||
logger.debug(() -> new ParameterizedMessage("using endpoint [{}]", endpoint)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this could just be logger.debug("using endpoint [{}]", endpoint);
.
Unless we are to have explicit tests for Minio, I don't think we should claim support or relax validation. And even then, if support for minio is desired, then was endpoints can still be validated against the correct region (or better yet, only allow endpoint, or both). For example, when the endpoint is a known s3 endpoint, the region should be verified it matches the endpoint? Also, a problem we used to have is new regions could not be used until we released an updated version with an updated list in our hardcoded list of regions. I think that having the default behavior still use the default endpoint will still work to avoid this problem, but I want to make sure it is explicitly considered in any changes, as that was very annoying for users since it could be months before we have a new release with added regions (and I don't think we should be updating these regions in bug fix versions). |
Thanks @DaveCTurner and everybody involves in this pull request. Thanks @rjernst for sharing your concerns, I agree with you on all points. When we'll reintroduce the region setting we'll make sure to put you into the loop and add the AWS S3 endpoint/region validation you asked for with appropriate tests. |
* master: [HLRC][ML] Add ML put datafeed API to HLRC (elastic#33603) Update AWS SDK to 1.11.406 in repository-s3 (elastic#30723)
* master: (43 commits) [HLRC][ML] Add ML put datafeed API to HLRC (elastic#33603) Update AWS SDK to 1.11.406 in repository-s3 (elastic#30723) Expose CCR stats to monitoring (elastic#33617) [Docs] Update match-query.asciidoc (elastic#33610) TEST: Adjust rollback condition when shard is empty [CCR] Improve shard follow task's retryable error handling (elastic#33371) Forbid negative `weight` in Function Score Query (elastic#33390) Clarify context suggestions filtering and boosting (elastic#33601) Disable CCR REST endpoints if CCR disabled (elastic#33619) Lower version on full cluster restart settings test Upgrade remote cluster settings (elastic#33537) NETWORKING: http.publish_host Should Contain CNAME (elastic#32806) Add test coverage for global checkpoint listeners Reset replica engine to global checkpoint on promotion (elastic#33473) HLRC: ML Delete Forecast API (elastic#33526) Remove debug logging in full cluster restart tests (elastic#33612) Expose CCR to the transport client (elastic#33608) Mute testIndexDeletionWhenNodeRejoins SQL: Make Literal a NamedExpression (elastic#33583) [DOCS] Adds missing built-in user information (elastic#33585) ...
* master: (128 commits) [HLRC][ML] Add ML put datafeed API to HLRC (elastic#33603) Update AWS SDK to 1.11.406 in repository-s3 (elastic#30723) Expose CCR stats to monitoring (elastic#33617) [Docs] Update match-query.asciidoc (elastic#33610) TEST: Adjust rollback condition when shard is empty [CCR] Improve shard follow task's retryable error handling (elastic#33371) Forbid negative `weight` in Function Score Query (elastic#33390) Clarify context suggestions filtering and boosting (elastic#33601) Disable CCR REST endpoints if CCR disabled (elastic#33619) Lower version on full cluster restart settings test Upgrade remote cluster settings (elastic#33537) NETWORKING: http.publish_host Should Contain CNAME (elastic#32806) Add test coverage for global checkpoint listeners Reset replica engine to global checkpoint on promotion (elastic#33473) HLRC: ML Delete Forecast API (elastic#33526) Remove debug logging in full cluster restart tests (elastic#33612) Expose CCR to the transport client (elastic#33608) Mute testIndexDeletionWhenNodeRejoins SQL: Make Literal a NamedExpression (elastic#33583) [DOCS] Adds missing built-in user information (elastic#33585) ...
The `repository-s3` plugin has supported a storage class of `onezone_ia` since the SDK upgrade in elastic#30723, but we do not test or document this fact. This commit adds this storage class to the docs and adds a test to ensure that the documented storage classes are all accepted by S3 too. Fixes elastic#30474
Updates the version of the AWS SDK to 1.11.340 (previously 1.11.225) in the repository-s3 plugin.
I was a bit worried about how the removal of the region setting (#22758) would play with the latest versions of the AWS SDK, because in code samples the region is always set when creating the client.
With this pull request, when no endpoint is defined then the client is initialized with the S3 default endpoint
and region( as provided by the AWS SDK) and with the enableForceGlobalBucketAccess option. With this option, the first HTTP request (doesBucketExist()) will be executed against the default S3 endpoint which returns a HTTP redirection response with the bucket's region set in the response headers. This region is then used by the AWS SDK for the other requests. It also keeps an internal cache that contains the resolved regions to use for a given bucket.
When the endpoint is defined in the settings:
AwsHostNameUtils.parseRegion()
method (this is the method used internally by the SDK to resolve region when the endpoint is overriden on the AmazonWebServiceClient).The method used to check if the endpoint is an AWS service endpoint is similar to what is done internally by the SDK.
The main goal of this pull request was to check how "easy" it is now to update AWS SDK using integration tests, to expose a new storage class (#30474) and other bug fixes, and also to facilitate user contributions like #25552.