-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump aws-sdk-go #7458
Bump aws-sdk-go #7458
Conversation
Actually, looks like this version of the SDK might have a bug in it. Let's hold off until aws/aws-sdk-go#2828 gets addressed -- if it gets fixed, let's just update to the fixed version. |
aws/aws-sdk-go#2828 doesn't seem to be a bug. Vault code just needs to call session.NewSession instead of New and it should be fixed. |
Hey @joelthompson - how do you feel about this PR right now? aws/aws-sdk-go#2828 seems to have lead to aws/amazon-vpc-cni-k8s#729, which removes the |
This is in support of hashicorp#7450 and hashicorp#7924
5341595
to
d4d5fe0
Compare
@catsby -- OK, I updated this PR with an update to the newer version of the golang sdk which should support IRSA (with a code update) and IMDSv2. Note my comment in #7924 about this potentially being a breaking change: beware that this can actually cause breaking changes for clients that are running in Docker containers. See, e.g., boto/botocore#1892 (comment) |
Awesome! Looks like the rubber hits the road here on this and since the new metadata service will be enabled by default for everyone, that should do it. I'm not sure yet if there will be any backwards compatibility issues for this. I saw this note here:
I'm not sure yet if the client is backwards-compatible to people who actually disable v2 and only want v1. I don't know why you'd do that since it's less secure, though I could see someone doing it if maybe they were using an AWS client outside of Go that only supported v1. However, in that case we could simply advise them to not disable v2 since that's what Vault uses. So I'm not sure if we really need to worry about v1 compatibility. |
I'm adding a 1.4 milestone to this because that's at least a couple of months out, and Joel pointed out that this may break clients running in Docker. Since 1.4 is at least a few months out, this'll give Boto a little time to get the fix in, and us some extra time to test it more fully. |
TBC, boto is really just aws-sdk-python by a different name, and it's the first project that noticed the issue since the awscli is really a wrapper around boto. Anyway, I may have raised a bit of a false alarm. Reading through the documentation and the SDK code a bit more, I think what will happen when this is run from a docker container is that it will wait 5 seconds to fetch the IMDSv2 token, and when that times out, it will fall back to IMDSv1 and mark internally that the instance of the API client shouldn't use IMDSv2 and subsequent calls will fall back to v1. My guess is that boto just failed to fall back to IMDSv1 (though I haven't read through the boto code). Of course, all this would need to be tested. And, of course, this can happen serially, resulting in noticeable slowdowns (see, e.g., aws/aws-sdk-go#2972) and poor user experience. The solution to avoid this 5-second timeout and enable admins to disable IMDSv1 completely is to call ec2:ModifyInstanceMetadataOptions on the instance and specify an Lastly, I don't believe it's possible to disable v2. You can disable v1 by specifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for working on it @joelthompson .
I'm thinking we should probably do a second step where we audit the 4 locations in Vault that could potentially be using the instance metadata service:
- The Vault AWS agent
- The Vault AWS CLI handler
- The clients created in the Vault AWS auth engine
- The clients created in the Vault AWS secrets engine
If we go through those and write a test for each that creates a client using only creds from V1 instance metadata, and another test using only creds from V2 instance metadata, that should help us know if we need to do any additional code updates.
Definitely out of scope for this PR, and I like keeping this separate, just noting the TODO's for posterity. :-) Going to wait for one more approval on this before merging.
Hey @tyrannosaurus-becks, We also need to think about the clients created for:
However, I'm not quite sure what you mean by "If we go through those and write a test for each that creates a client using only creds from V1 instance metadata." Currently, unless/until my AWS golang SDK feature request is implemented, the only way to achieve that is to run all the tests inside of a Docker container (or some other similar magic that will cause IMDSv2 to fail), And writing automated tests for these is hard because it really depends on the execution environment the test is being run in -- you really need a way to run these tests on different EC2 instances with different IMDS configurations. That being said, I'm happy to manually run these tests manually on an EC2 instance with IMDSv1 disabled and report the results (though I'm not as familiar with the parts of Vault other than the secrets engine and auth method so might need some help). And I'm happy to continue brainstorming about how we can fully and properly test this to make sure it works properly for Vault users. |
I've had a chance to re-review this and look more. I see that the outward bound calls are the main change, and the responses appear the same. The version of the AWS SDK here should be good as long as we also use |
In support of #7450.
I think that there's an additional step needed to fulfill the goal of #7450 and will do some testing to follow up and reply in that other PR.
I ran manual integration tests for the AWS secrets engine and auth method.