cluster-autoscaler caches workload cluster kubeconfig #4784

charlie-haley · 2022-04-05T09:41:12Z

Which component are you using?: cluster-autoscaler

What version of the component are you using?: 1.21.2

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"archive", BuildDate:"2022-03-17T21:14:47Z", GoVersion:"go1.18", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?: EKS managed by Cluster API

What did you expect to happen?:

The autoscaler should use the renewed kubeconfig for the workload cluster.

What happened instead?:

After around ~10 minutes the token renews in the kubeconfig secret for the workload cluster and the autoscaler no longer works, the pod has to be killed for it to pick up the new token.

How to reproduce it (as minimally and precisely as possible):

Deploy cluster-autoscaler on a CAPI management cluster, pointing at a workload cluster.

Example Helm values:

additionalLabels:
  app: autoscaler
autoDiscovery:
  clusterName: mycluster-name
cloudProvider: clusterapi
clusterAPIConfigMapsNamespace: kube-system
clusterAPIKubeconfigSecret: mycluster-name-kubeconfig
clusterAPIMode: kubeconfig-incluster
extraArgs:
  balance-similar-node-groups: true
  expander: least-waste
  leader-elect: false
  logtostderr: true
  stderrthreshold: error
  v: 1

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2022-07-04T16:59:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

richardcase · 2022-07-05T12:43:54Z

/remove-lifecycle stale

k8s-triage-robot · 2022-08-04T13:08:26Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

richardcase · 2022-09-02T10:29:16Z

/remove-lifecycle rotten

k8s-triage-robot · 2022-12-01T11:25:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

richardcase · 2022-12-01T11:47:27Z

/remove-lifecycle stale

k8s-triage-robot · 2023-03-01T12:35:42Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Skarlso · 2023-03-20T16:43:36Z

/assign

I'm gonna take a stab at this.

Skarlso · 2023-03-20T17:41:21Z

@charlie-haley Hello 👋
So I'm about to do something about this thing. I see you did some annotation thing in the attached PR. Were you planning on reloading the whole thing every 10 minutes?

Wouldn't it be better to just not cache the config at all, but rather whenever it tries to access something just always fetch the kubeconfig? At least that's what I'm planning on doing.

elmiko · 2023-03-20T20:29:45Z

(copied from slack)

another approach might be to add a flag which would allow a user to specify that it should be reloaded on each transaction. my concern is that the cluster-api provider is quite chatty on the kube client and i wonder if rebuilding the client every 15 seconds is going to negatively affect performance.

i don't think there would be an issue rebuilding it once every scan interval, maybe we could focus on have one client cached per interval or something.

one more thought, rebuilding on failure is another option here. on further reflection the flag is probably not a good option. we should either rebuild every interval, or rebuild on failure, potentially keeping a single cached client for backup.

Skarlso · 2023-03-20T20:34:50Z

Cool, further talk with more ideas:

rebuilding the client on each 15 second loop
if that fails fall back to reloading

k8s-triage-robot · 2023-04-19T21:33:36Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Skarlso · 2023-04-20T07:00:05Z

/remove-lifecycle rotten

Actively working on this. We have a way forward. We are going to fetch the token from the kubeconfig; create a client and override the transport; and keep refreshing the token in a go routine separate from the actual call. This will make sure that we don't have to rebuild the informers all the time and client should remain authenticated. 🤞

Testing this will be super painful. :D

charlie-haley · 2023-04-24T08:10:17Z

@charlie-haley Hello wave So I'm about to do something about this thing. I see you did some annotation thing in the attached PR. Were you planning on reloading the whole thing every 10 minutes?

Wouldn't it be better to just not cache the config at all, but rather whenever it tries to access something just always fetch the kubeconfig? At least that's what I'm planning on doing.

Sorry for the slow reply, I was away! That's correct, we'd reload the autoscaler when the secret changed which was roughly every 10 minutes. It's not the cleanest solution but it worked for our use-case so I never delved into it further.

Refreshing the kubeconfig token in the background definitely sounds like a good plan 🎉

k8s-triage-robot · 2024-01-19T01:00:00Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-18T01:53:13Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-19T02:46:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-19T02:46:05Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

elmiko · 2024-03-19T14:29:12Z

i'm not sure if we've completely solved this yet, but i want to keep it open until we know as this is important to capi.

/reopen

k8s-ci-robot · 2024-03-19T14:29:18Z

@elmiko: Reopened this issue.

In response to this:

i'm not sure if we've completely solved this yet, but i want to keep it open until we know as this is important to capi.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2024-04-20T12:50:54Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-04-20T12:50:59Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

charlie-haley added the kind/bug Categorizes issue or PR as related to a bug. label Apr 5, 2022

jbartosik added the area/cluster-autoscaler label Apr 5, 2022

charlie-haley mentioned this issue Apr 6, 2022

feat: add support for deployment annotations to helm chart #4791

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 4, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 2, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022

dlipovetsky mentioned this issue Dec 12, 2022

"The workload cluster kubeconfig (for use by controllers, not end users) causes unauthorized errors after exactly 15 minutes" kubernetes-sigs/cluster-api-provider-aws#3066

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 1, 2023

k8s-ci-robot assigned Skarlso Mar 20, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 19, 2023

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 20, 2023

Skarlso mentioned this issue Apr 20, 2023

bug: auto-refresh tokens in a kube config file #5699

Closed

cnmcavoy mentioned this issue Jul 13, 2023

clusterapi: refresh kubeconfig bearer tokens for management and workload kubeconfigs dynamically #5951

Closed

cnmcavoy mentioned this issue Nov 1, 2023

Refactor the AWS & EKS control-plane controllers to split the kubeconfig secret into two for Cluster Autoscaler kubernetes-sigs/cluster-api-provider-aws#4607

Closed

cnmcavoy mentioned this issue Nov 21, 2023

✨ Add separate eks kubeconfig secret keys for the cluster-autoscaler kubernetes-sigs/cluster-api-provider-aws#4648

Merged

4 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024

k8s-ci-robot reopened this Mar 19, 2024

towca added the area/provider/cluster-api Issues or PRs related to Cluster API provider label Mar 21, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-autoscaler caches workload cluster kubeconfig #4784

cluster-autoscaler caches workload cluster kubeconfig #4784

charlie-haley commented Apr 5, 2022 •

edited

Loading

k8s-triage-robot commented Jul 4, 2022

richardcase commented Jul 5, 2022

k8s-triage-robot commented Aug 4, 2022

richardcase commented Sep 2, 2022

k8s-triage-robot commented Dec 1, 2022

richardcase commented Dec 1, 2022

k8s-triage-robot commented Mar 1, 2023

Skarlso commented Mar 20, 2023

Skarlso commented Mar 20, 2023

elmiko commented Mar 20, 2023 •

edited

Loading

Skarlso commented Mar 20, 2023

k8s-triage-robot commented Apr 19, 2023

Skarlso commented Apr 20, 2023

charlie-haley commented Apr 24, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-triage-robot commented Feb 18, 2024

k8s-triage-robot commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

elmiko commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

k8s-triage-robot commented Apr 20, 2024

k8s-ci-robot commented Apr 20, 2024

cluster-autoscaler caches workload cluster kubeconfig #4784

cluster-autoscaler caches workload cluster kubeconfig #4784

Comments

charlie-haley commented Apr 5, 2022 • edited Loading

k8s-triage-robot commented Jul 4, 2022

richardcase commented Jul 5, 2022

k8s-triage-robot commented Aug 4, 2022

richardcase commented Sep 2, 2022

k8s-triage-robot commented Dec 1, 2022

richardcase commented Dec 1, 2022

k8s-triage-robot commented Mar 1, 2023

Skarlso commented Mar 20, 2023

Skarlso commented Mar 20, 2023

elmiko commented Mar 20, 2023 • edited Loading

Skarlso commented Mar 20, 2023

k8s-triage-robot commented Apr 19, 2023

Skarlso commented Apr 20, 2023

charlie-haley commented Apr 24, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-triage-robot commented Feb 18, 2024

k8s-triage-robot commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

elmiko commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

k8s-triage-robot commented Apr 20, 2024

k8s-ci-robot commented Apr 20, 2024

charlie-haley commented Apr 5, 2022 •

edited

Loading

elmiko commented Mar 20, 2023 •

edited

Loading