-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling up from 0 on EKS #1580
Comments
If Cluster Autoscaler is running in kube-system namespace, and there's no PDB for it, it should never delete the node where it is running. It should also never scale down the cluster to 0 nodes, are you sure this was Cluster Autoscaler's decision? As for "no deployment" - if you're deploying it manually, you determine if it's running as a deployment or not.
With multiple masters, leader election ensures there's only one active instance at any time. If you manage to hack around this, all instances will still try to scale up their respective node groups for the same pods. To avoid surplus scale-ups, you'd have to schedule each pod on a dedicated node group, at which point it may be simpler to just have separate clusters. |
Right, I did not tell the whole story :s Sorry: So, the scenario I'm in:
The latter 2 should be able to scale down to 0 and back up.
Thanks for your reply |
I was probably wrong is saying that it was the autoscaler that decided to scale down to 0. I retested: So for my question one, I guess the conclusion is that it's not possible to scale down to 0 and up from 0 in my case.. |
Before concluding this, can you say what's the reason for running multiple instances of Cluster Autoscaler? One is usually enough to scale a cluster, and scaling groups to and from 0 is supported as well. |
My idea was to scale 1 node group for which I have node selector 'cpu_spot' separately from another node group with node selector 'gpu_spot'. Functionally processing jobs come in randomly during the day, and we want to scale the node groups because these jobs can run up to an hour and we have the requirement to start them immediately. Some jobs require cpu, others need gpu. Hence 2 cluster autoscalers (was my idea) About supporting scaling to and from 0:
Probably I'm still missing something.. Regards, |
Why not put Cluster Autoscaler on cpu-always-on group (or another group for infrastructure/kube-system pods)? Then it can scale other groups from there.
Assuming the CPU jobs are banned from GPU nodes (e.g. by the use of taints), Cluster Autoscaler should scale the correct group out-of-the-box. For correct handling of scale from 0 with GPUs, you may need to temporarily add the GKE-specific label to your GPU nodes, |
Ok, that's a very good idea. Understanding more the possibilities, I will handle it like that. |
@timv2 Many users follows the way @aleksandra-malinowska mentioned on EKS, Since you have multiple ASG, you can deploy CA on always running one and use it to scale up/down from/to 0 for Spot Instance group. BTW, please provide some feedbacks on spot instance support of CA. We'd also like to improve this part. GPU support is a little bit tricky on EKS. Without label, you will see nodes scale up too much. Please follow the way. I am on that task and will update documents for GPU users here. https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws |
@aleksandra-malinowska Could I assign this issue to me? I will update CA aws documentation. |
Done. Although we don't really rely on "assignee" field much, feel free to send a PR in the future even if you're not assigned;) |
Thanks, For gpu, scale up from zero seems not correctly working, but I will test in detail and let you know my findings |
Success for gpu as well (indeed needing the node label) |
Glad to know it works for you :) |
@Timvissers hi I try to achieve the similar setup: multiple node groups, some with spot intances on EKS. Can you share your configuration for CA and CF for node groups? |
@alexei-led Must have been something like this:
` I can say that afterwards I switched to using this terraform module: https://github.com/terraform-aws-modules/terraform-aws-eks and deployed the latest cluster-autoscaler helm chart. Using the module, you can just mark node groups as 'autoscaling_enabled' and everything comes out of the box, no more need to manually configure the cluster-autoscaler |
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.13.2 to 2.14.0. - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.13.2...v2.14.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Hi,
I have the following setup
EKS with Kubernetes 1.11.5
Cluster autoscaler 1.3.5 with single ASG
I wonder,
since in EKS we have no access to a master node, we cannot use cluster autoscaler deployment on the master node. I'm wondering, when we scaled down to 0 and need to scale up, since there is no deployment, how will the cluster autoscaler ever know that there is reason to scale up?
I'm testing it, and it's not working.
A 2nd question, partly related, can we have multiple cluster autoscalers deployed (scaling up different node groups independently)?
Regards,
Tim
The text was updated successfully, but these errors were encountered: