-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add separate eks kubeconfig secret keys for the cluster-autoscaler #4648
✨ Add separate eks kubeconfig secret keys for the cluster-autoscaler #4648
Conversation
I have deployed this change in Indeed's clusters and in my testing, the official release of v1.27 cluster autoscaler was able to refresh the bearer token file in EKS clusters. |
1533ae0
to
c43f7cc
Compare
c43f7cc
to
c4a814a
Compare
| value | contains a complete kubeconfig with the cluster admin user and token embedded | | ||
| cluster-autoscaler-value | contains a kubeconfig with the cluster admin user, referencing the token file in a relative path - assumes you are mounting all the secret keys in the same dir | | ||
| cluster-autoscaler-token-file | contains the same token embedded in the complete kubeconfig, it is separated into a single file so that existing APIMachinery can reload the token file when the secret is updated | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are more use cases than just cluster-autoscaler, wdyt about using descriptive names like embedded
, relative
and single-file
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names are good, but changing the name of value
means we have a different Secret shape in EKS clusters than self-managed clusters. I'll update the other two, but I am not sure changing the name of value
is worth it, as it will be surprising to users.
/hold cancel |
2a13c53
to
f57a8ae
Compare
f57a8ae
to
643f8eb
Compare
/lgtm |
/test pull-cluster-api-provider-aws-e2e |
It would be good to get cluster autoscaler added to our e2e tests. Lets create an issue to follow up on this. From my side this look good: /approve When the e2e passes we can unhold and merge: /hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: richardcase The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
643f8eb
to
c50ba26
Compare
/retest-required |
Only a rebase. |
/test pull-cluster-api-provider-aws-e2e |
You will have to fix the test first. |
/test pull-cluster-api-provider-aws-e2e-eks |
@mloiseleur seems like the test failure is a flake? I pushed a commit to simply expose the error being suppressed in cloudformation and now it passed. My new commit should not have fixed any tests. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lgtm |
/remove-lifecycle stale |
/hold cancel |
@cnmcavoy I updated to v2.6.1 this morning and I saw in the logs an error about ownership of the kubeconfig secret in the log for all control plane. E0807 08:02:32.550249 1 controller.go:329] "Reconciler error"
err="failed to reconcile control plane for AWSManagedControlPlane flux-system/xxxxxxxxx-cp:
failed reconciling kubeconfig: updating kubeconfig secret:
EKS kubeconfig flux-system/xxxxxxxxx-kubeconfig missing expected AWSManagedControlPlane ownership"
controller="awsmanagedcontrolplane"
controllerGroup="controlplane.cluster.x-k8s.io"
controllerKind="AWSManagedControlPlane"
AWSManagedControlPlane="flux-system/xxxxxxxxx-cp"
namespace="flux-system"
name="xxxxxxxxx-cp"
reconcileID="c634fd01-5c99-4947-9ff7-3297fcaff97c" Did you encounter this issue when you tested on your side ? EDIT: On my side, I fixed it by deleting the secret. The new secret was created with expected ownership and it get back on its feet. |
The secret was created with an owner reference in the previous releases as well. Previously the shape of the secret was assumed, the new behavior that you encountered is that the controller checks if it owns the secret before operating on it. Deleting and recreating was also going to be my suggestion, but the owner reference should already exist unless the secret was not created by the controller. My suspicion is flux or some other system created the secret and so CAPA didn't set the owner reference. |
What type of PR is this?
/kind feature
What this PR does / why we need it: Cluster Autoscaler can not mount and consume the Cluster API Kubeconfig because the secret contents are refreshed every ten minutes, and no API Machinery exists to reload a kubeconfig safely.
Initially, I attempted solve this in the Cluster Autoscaler: kubernetes/autoscaler#4784 - However meeting with SIG APIMachinery on Nov 1 2023, the SIG cautioned against this approach and advised splitting the token out from the kubeconfig, as there is existing machinery to reload a token file auth. By switching to this approach, no change in the Cluster Autoscaler is needed, users only need to update their Cluster Autoscaler configuration to use the correct secret file from their secret volume mount.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #4607
Special notes for your reviewer:
Checklist:
Release note: