-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19050. Add S3 Access Grants Support in S3A #6544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19050. Add S3 Access Grants Support in S3A #6544
Conversation
Unit tests and Tests results below, per my understanding all failures are due to various other known reasons:
|
Hi @ahmarsuhail and @steveloughran - I've opened this with the new code that has the S3 Access Grants plugin as part of the AWS Java SDK bundle (this is an evolution of #6507). We should consider #6507 deprecated as of now, as I believe I have addressed all feedback from that PR on here. Please let me know your thoughts! |
WRT ITest failures, this is the Tested on PDX,
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, some minor comments
S3AccessGrantsPlugin accessGrantsPlugin = | ||
S3AccessGrantsPlugin.builder().enableFallback(isFallbackEnabled).build(); | ||
builder.addPlugin(accessGrantsPlugin); | ||
LOG_S3AG_ENABLED.info("S3 Access Grants plugin is added to S3 client with fallback: {}", isFallbackEnabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't log, cause you have already used the only once on line 421. Cut the log on 421, and just keep this one.
Update text to "S3Access Grants plugin is enabled with IAM fallback set to {} "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, changed.
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
Outdated
Show resolved
Hide resolved
## S3 Authorization Using S3 Access Grants | ||
|
||
[S3 Access Grants](https://aws.amazon.com/s3/features/access-grants/) can be used to grant accesses to S3 data using IAM Principals. | ||
In order to enable S3 Access Grants to work with S3A, we enable the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Cut the "we" and "you", so just "In order to enable S3 Access Grants, S3A uses the Access grant pugin" .. applicable to the rest of the section as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done!
return awsClient.serviceClientConfiguration().credentialsProvider().getClass().getName(); | ||
} | ||
|
||
private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> AwsClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you're returning an instance of AwsClient
, i don't think you need generics here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, changed.
getCredentialProviderName(s3ExplicitlyDisabledClient)); | ||
} | ||
|
||
private Configuration createConfig(boolean s3agEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need to do removeBaseAndBucketOverrides
here before setting the value.
and is there a way to check for if the IAM fallback is set on the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need to do removeBaseAndBucketOverrides here before setting the value.
I'm not sure about this because I'm starting a new Hadoop Configuration object itself rather than the createConfiguration
methods that we use from the S3ATestUtils. In the end, I don't think it matters - because as long as we set the S3 Access Grants properties, that's all that matters to us for the purpose of this test, no?
and is there a way to check for if the IAM fallback is set on the client?
Unfortunately not :( Did a lot of digging but in short, the plugins are "applied" to the client. When we apply the S3 Access Grants plugin on the S3 clients, we get the following identity provider set as the Credential Provider for this client: S3AccessGrantsIdentityProvider
. And in the case of the fallback, the fallback flag is only set on the S3AccessGrantsIdentityProvider
class but as a private variable that we cannot access.
@adnanhemani re test failures, just updating the core-site won't be enough for some of them, you'll also need the code changes in Steve's PR #6515 , that should get merged soon so then you can rebase and retest. |
🎊 +1 overall
This message was automatically generated. |
Hi @ahmarsuhail, I've made all changes requested. Please take a look when you can. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. are you ok with this approach @steveloughran ?
Hi @steveloughran, any thoughts on this? |
really focused purely on 3.4.0 shipping right now, not looking at stuff it doesn't need. sorry |
Test results. thanks for these. you only need failures. (we don't care about successes unless something has got very slow)
For example, for ITestS3ATemporaryCredentials and delegation; set test.fs.s3a.sts.enabled false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks low risk and minimal now.
I'm curious about whether it is possible to test this with an ITest everywhere -and what happens? would it be possible to write a test for this?
now, how does this integrate with the usual auth mechanism? the standard credentials passed in are used to auth with (something? what?) to get the session credentials.
- How are complex ops like rename() coped with?
- When do things fail and how? specifically, are new errors likely to be raised when S3 requests are made, and if so is their translation valid?
* and | ||
* https://github.com/aws/aws-s3-accessgrants-plugin-java-v2/ | ||
*/ | ||
public static final String AWS_S3_ACCESS_GRANTS_ENABLED = "fs.s3a.s3accessgrants.enabled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can you use "fs.s3a.access.grants" as the prefix here and below
- It'd be good have s3afs .hasPathCapability() return the enabled flag for ease of testing
|
||
private static <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> void | ||
applyS3AccessGrantsConfigurations(BuilderT builder, Configuration conf) { | ||
if (!conf.getBoolean(AWS_S3_ACCESS_GRANTS_ENABLED, false)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define and use a constant AWS_S3_ACCESS_GRANTS_ENABLED
here.
makes it easier to see/change what the default is in future.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Show resolved
Hide resolved
|
||
package org.apache.hadoop.fs.s3a; | ||
|
||
import org.apache.hadoop.conf.Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: import ordering
...-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AccessGrantConfiguration.java
Show resolved
Hide resolved
import org.junit.Test; | ||
|
||
import software.amazon.awssdk.awscore.AwsClient; | ||
import software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsIdentityProvider; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there anything to be gained by adding a subclass of this into fs.s3a.auth module?
we've found it useful in the past, including for migrating the underlying implementation, and other bug fixes/scale issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure, given that all the code we are testing does not belong as part of the fs.s3a.auth
module. Or did I misunderstand that by "this" you are not referring to the testing class itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure, honestly since all the code we are looking to test doesn't actually reside in that module. Or am I misunderstanding what you meant by "this" meaning the test class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean a subclass of the S3AccessGrantsIdentityProvider the way we do for env vars, IAM roles etc. lets us hide some of the implementation details (e.g. how we moved from EC2 to container creds, upgraded to v2 sdk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this would function? Since we don't actually control the usage of this class - it's in the internals of the Access Grants plugin itself. In other words, if there is a version upgrade on the S3 Access Grants plugin and that include a change to this class name/structure, we will have to update this to reflect those changes - it's not in our control which class is used?
|
||
## S3 Authorization Using S3 Access Grants | ||
|
||
[S3 Access Grants](https://aws.amazon.com/s3/features/access-grants/) can be used to grant accesses to S3 data using IAM Principals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a good time to move everything related into authentication into its own markdown file (authentication.md) and link that off index.
this entry would be its own heading ## S3 Access grants with a reference so index generation will create a quick link
which allows S3A to fallback to using the IAM role (and its permission sets directly) to access S3 data in the case that S3 Access Grants | ||
is unable to authorize the S3 call. | ||
|
||
The following is how this feature can be enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable this feature
boolean isFallbackEnabled = | ||
conf.getBoolean(AWS_S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED, false); | ||
S3AccessGrantsPlugin accessGrantsPlugin = | ||
S3AccessGrantsPlugin.builder().enableFallback(isFallbackEnabled).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. put the enable() and build() on their own lines
</property> | ||
``` | ||
|
||
Please note that S3A only enables the [S3 Access Grants plugin](https://github.com/aws/aws-s3-accessgrants-plugin-java-v2) on the S3 clients |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a subsection "notes" as more notes are added.
Hi Steve, thanks for reviewing this as you finished up with the Hadoop release work, I really appreciate you taking the time to look at this. I'm re-running the integration and unit tests after I made the changes you suggested above - I will ping this thread when the code is pushed and ready for a re-review. Responding to your earlier questions:
While it should be technically possible to do so, I'm not sure there's an easy (or acceptable) way to do so. In order to test this functionality with an ITest, we would need to set up an Access Grants instance, locations, and access grants themselves prior to actually running the client-side code, which we are integrating. Creating and tearing down those resources would need a good deal of effort imo - and do not come for free cost-wise either. Going back to the note we've made explicitly in the documentation, if there is a problem with functionality, this would be a general problem with the S3 Access Grants plugin - and not with S3A, as our unit test is capturing all code used for enabling this plugin. I believe the ITests on the S3 Access Grants plugin should then be sufficient for covering the full testing coverage.
Good question - when using S3 Access Grants, the flow would be to use your standard credentials to authenticate yourself to the S3 Access Grants server. Which would then hand you back a new set of credentials once you are authorized - this set of credentials should have access to the S3 objects you are looking to access. The Access Grants on the S3 Access Grants server are set by your account/organization's data admin, who is in charge of determining what sets of permissions each user should be able to receive. (I'm not sure if I'm answering the right question here - please feel free to elaborate on the question if I didn't answer it properly).
By I don't believe there's a S3 API specifically for If there are any other complex/not-so-popular S3 actions that the user attempts to do, the S3 Access Grants plugin will not attempt to do credential vending and instead always instruct the S3 Client to fall back to using the user's provided authentication credentials.
So given that, functionally, the only thing we are changing is the credentials which the user is using to access the S3 data, there should not be any changes to the set of errors that can be returned from S3 while attempting to access the data itself. However, there could be an error thrown while the user is attempting to authorize against the S3 Access Grants server. In this case, I believe the only exception the S3 Access Grants plugin is allowed to throw is a |
3e151bd
to
4247a5e
Compare
@steveloughran - I ran the unit tests and integration tests again after rebasing (ITests against No unit test failures, 2 ITest failures:
I should probably figure out how to skip the ACL test in the future, so really should just be one failure. Please advise if this test is something I should look into. Code is pushed to this PR and ready for another review - thanks! |
🎊 +1 overall
This message was automatically generated. |
You're right there is no rename; copy is all there is. So that is not available yet? Hmmm. This isn't ready for production yet is it? Let us keep it in trunk for now. The other strategy would be to do a feature branch for it, which has mixed benefits. Good: isolated from other work. Bad: isolated from other work. So far the changes are minimal enough it is not a problem. Now that I am working on a bulk delete API targeting Iceberg and similar where the caller congener write a bulk delete call across the bucket; currently in S3AFS we only do bulk deletes down a "directory tree" either in delete or incrementally during rename(). In both of these cases there is already no guarantees that your system will be left in a nice state if you don't have the permissions to do things. Regarding testing, when you think it is ready for others to play with a section in testing.md on how to get set up for this would be good. Well I don't personally have plans for that, maybe I could persuade colleagues. I tried to test a lot of the other corner cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nearly ready; commented
|
||
configureEndpointAndRegion(builder, parameters, conf); | ||
|
||
applyS3AccessGrantsConfigurations(builder, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename maybeApply... to make clear it isn't guaranteed to happen
.enableFallback(isFallbackEnabled) | ||
.build(); | ||
builder.addPlugin(accessGrantsPlugin); | ||
LOG.info("S3 Access Grants plugin is enabled with IAM fallback set to {}", isFallbackEnabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to add once so that on a large system the logs don't get full of noise. tricky choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why we had added the log once earlier but I removed it in the last revision based on your comments (maybe I misunderstood...?)
Personally, I agree with making it a log once so I'll change it back.
private String configuredRegion; | ||
|
||
/** | ||
* Is a S3 Access Grants Enabled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "are S3 Access Grants Enabled?"
return fipsEnabled; | ||
|
||
// is S3 Access Grants enabled | ||
case AWS_S3_ACCESS_GRANTS_ENABLED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our bucket-info command has a list of capabilities to probe for -add this to the list
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/authentication.md
Outdated
Show resolved
Hide resolved
import org.junit.Test; | ||
|
||
import software.amazon.awssdk.awscore.AwsClient; | ||
import software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsIdentityProvider; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean a subclass of the S3AccessGrantsIdentityProvider the way we do for env vars, IAM roles etc. lets us hide some of the implementation details (e.g. how we moved from EC2 to container creds, upgraded to v2 sdk)
...-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AccessGrantConfiguration.java
Outdated
Show resolved
Hide resolved
Configuration configuration, boolean asyncClient, String message, boolean shouldMatch) | ||
throws IOException, URISyntaxException { | ||
AwsClient awsClient = getAwsClient(configuration, asyncClient); | ||
AbstractStringAssert<?> assertion = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice trick! always good to see advanced uses of AssertJ
💔 -1 overall
This message was automatically generated. |
Yeah :/ I believe there are still some committer types that are not renaming on every write and so some simple queries will actually succeed. But in general, the read queries are succeeding, which is why this is still going to be useful in the meantime to the community. When the new functionality arrives, it will only require a bump on the SDK version - so I don't think this is a blocker for
I believe the AWS Documentation and supporting blog posts have the right amount of instructions on how to have a sample setup. Then the Hadoop community members can test this code against that setup. Is there a specific reason to include it as part of this repo then? If you think we really should have a section on this, would it be okay to just add a link to the relevant blog posts and documentation in |
3d507d2
to
3cfb18e
Compare
🎊 +1 overall
This message was automatically generated. |
please, please please, no force push once reviews have started unless there's merge problems or its been neglected for too long...makes it harder to seee what's changed between reviews |
Sorry! Yetus was complaining that it could not apply the changes on top of |
hey, if yetus is unhappy, rebase is the right thing to do, so don't worry too much. just trying to remember where i was, that's all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, merging to trunk, but not yet to branch-3.4. I do expect a 3.4.1 release to come out in a couple of months. If the SDK update is ready then this change should go in with the SDK; I don't want to have the branch-3.4 and trunk diverging if possible
Yup, this makes sense. I will ensure that Hadoop gets the SDK changes as soon as the SDK updates are complete. Thank you again for all your time reviewing this! |
@adnanhemani hadoop branch-3.4 AWS SDK is now at 2.25.53. That has everything you need, doesn't it? If you can do a request of this patch back part branch 3.4 I will merge it and we can hopefully get it into 3.4.1 release |
Unfortunately, I think there are further changes that are still missing from the AWS SDK to unblock all code paths. But I've moved off of this work recently - I'll let the new owner comment @singhpk234. |
@steveloughran thanks for bringing this up, Thanks @adnanhemani for connecting us. |
that or we abandon s3 transfer manager. that has been discussed more than once |
noticed this isn't in branch-3.4 (some merge issues) where are we with this? |
Description of PR
This adds S3 Access Grants support in S3A using the S3 Access Grants plugin, which is a part of the AWS Java SDK bundle starting in v2.23.19. As part of this PR, we are adding new configuration flags that the user can use to enable the usage of the S3 Access Grants plugin on the S3 clients that S3A starts up.
How was this patch tested?
New unit tests written. Unit testing and integration testing patches incoming momentarily.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?