-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use --profile instead of AWS_PROFILE for kubeconfig #1484
Conversation
Does the PR have any schema changes?Looking good! No breaking changes found. |
// Use --profile instead of AWS_PROFILE because the latter can be | ||
// overridden by ambient credentials: | ||
// https://docs.aws.amazon.com/cli/latest/topic/config-vars.html#id1 | ||
args = [...args, "--profile", opts.profileName]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only potentially user-facing change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change sounds good to me!
Tbh, I'm a bit surprised that even aws eks update-kubeconfig
uses the env variable for specifying the profile.
I think it would make sense to extend the aws-profile
test to actually deploy something to the cluster. That way we also cover the change with upgrade tests to ensure this doesn't cause any cascading replacements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, based on aws/aws-cli#7794 and aws/aws-cli#4337 I'm even more convinced --profile
is the correct behavior here.
I'll look into the upgrade test to make sure this doesn't trigger replacements. I don't think it will since the k8s provider primarily looks at the cluster/context (not user, which this impacts), but if it does we should be able to leverage the new clusterIdentifier
config to pin the provider to the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to extend the aws-profile test to actually deploy something to the cluster. That way we also cover the change with upgrade tests to ensure this doesn't cause any cascading replacements
@flostadler I re-enabled this upgrade test and recorded a baseline from master with a ConfigMap deployed into the cluster. It passes locally for me and I confirmed manually that no replacements were triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, that's awesome!
.github/workflows/cron.yml
Outdated
@@ -1,7 +1,6 @@ | |||
env: | |||
ALT_AWS_ACCESS_KEY_ID: ${{ secrets.ALT_AWS_ACCESS_KEY_ID }} | |||
ALT_AWS_SECRET_ACCESS_KEY: ${{ secrets.ALT_AWS_SECRET_ACCESS_KEY }} | |||
ALT_AWS_PROFILE: ${{ secrets.ALT_AWS_PROFILE }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and "Configure AWS CLI" are now handled directly by the tests.
examples/aws-profile-py/__main__.py
Outdated
if not os.getenv("AWS_REGION"): | ||
raise Exception("AWS_REGION must be set") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mirroring the ts test.
examples/aws-profile-py/__main__.py
Outdated
@@ -25,6 +28,7 @@ | |||
# Create the cluster using the AWS provider and credential opts. | |||
cluster = eks.Cluster(project_name, | |||
provider_credential_opts=kubeconfig_opts, | |||
# TODO(#1475): bootstrap_self_managed_addons=false, # To speed up the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion -- I noticed the coredns addon adds ~900s to the test, and it would be really great to disable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The culprit here is not aws adding the coredns addon, they do that as a self-managed addon (just a helm chart) and it doesn't wait for coredns to be healthy.
We're now using the managed coredns addon by default and the managed addon for coredns indeed takes a very long time to detect that the deployment has enough replicas (~900s like you mentioned). If you want to disable it for this test, you can do so by setting corednsAddonOptions.enabled
to false. I don't think we need it for this test because we're already testing that with others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a huge time saver, thank you!
pulumi-eks>=3.0.0,<4.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes local debugging easier.
examples/examples_nodejs_test.go
Outdated
profile := "aws-profile-node" | ||
setProfileCredentials(t, profile) | ||
|
||
test := getJSBaseOptions(t). | ||
With(integration.ProgramTestOptions{ | ||
Dir: path.Join(getCwd(t), "aws-profile"), | ||
OrderedConfig: []integration.ConfigValue{ | ||
{Key: "pulumi:disable-default-providers[0]", Value: "aws", Path: true}, | ||
}, | ||
RetryFailedSteps: false, | ||
Env: []string{ | ||
"ALT_AWS_PROFILE=" + profile, | ||
"AWS_PROFILE=", // unset | ||
"AWS_SECRET_ACCESS_KEY=", // unset | ||
"AWS_ACCESS_KEY_ID=", // unset | ||
"AWS_SESSION_TOKEN=", // unset | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unsets the credentials for the pulumi subprocess instead of globally.
LGTM, thanks for this improvement! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
For context, we previously wrote the profile name into the generated kubeconfig. This had the unintended consequence of the kubeconfig not being portable if another user has a different AWS profile name. This logic was switched to using ambient env vars to determine the right AWS profile to make the kubeconfig portable. However, env vars are volatile as you've surfaced in this PR, so using the --profile
flag is a more robust solution!
This PR has been shipped in release v3.1.0. |
This PR changes our kubeconfig logic to use a
--profile
arg instead of anAWS_PROFILE
environment variable so it will always use the expected profile. It also parallelizes the relevant tests and simplifies workflows slightly.As a user, if I generate a kubeconfig for a particular profile I would expect that configuration to always use the profile I specified. However, because we rely on
AWS_PROFILE
it is possible for our generated kubeconfig to be inadvertently overridden by the presence ofAWS_ACCESS_KEY_ID
.https://docs.aws.amazon.com/cli/latest/topic/config-vars.html#id1
If I got this wrong and the overriding behavior is a "feature not a bug" please let me know!
I'll note that I discovered this as part of the prep work for pulumi/ci-mgmt#1142. In particular, our tests currently do a few things to exercise profile switching behavior:
default
profile. (This is unnecessary.)alt
profile. This is the profile we expect to use inTestAccAwsProfile*
tests.TestAccAwsProfile*
tests we unsetAWS_SECRET_ACCESS_KEY
,AWS_ACCESS_KEY_ID
, andAWS_SESSION_TOKEN
for our process.Importantly, (3) is currently implemented such that (a) it prevents parallelization, and (b) subsequent queries to the k8s API server also lack ambient credentials.
After I refactored (3) to allow parallelization the tests started failing. Eventually I realized this was because I was unsetting credentials for the
pulumi
subprocess and our test's k8s client now had ambient credentials taking priority over the expected profile.