-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws: Don't pile up successive full refreshes during AWS scaledowns #3797
aws: Don't pile up successive full refreshes during AWS scaledowns #3797
Conversation
0f745a5
to
09ab1ea
Compare
/assign @gjtempleton |
Thanks for the PR. My only concern is whether we could somehow make the log messages clearer, in particular: https://github.com/kubernetes/autoscaler/pull/3797/files#diff-22984a3a02b16ff49b2a94a43b49f3aa61c856483c3612b06b69dd51e347746fL269 Though it's already a bit misleading. |
This might be clearer but still not great perhaps (or do you have a suggestion @gjtempleton )?:
|
@bpineau that reads far better to me, one minor nit, maybe?:
|
Force refreshing everything at every DeleteNodes calls causes slow down and throttling on large clusters with many ASGs (and lot of activity). That function might be called several times in a row during scale-down (once for each ASG having a node to be removed). Each time the forced refresh will re-discover all ASGs, all LaunchConfigurations, then re-list all instances from discovered ASGs. That immediate refresh isn't required anyway, as the cache's DeleteInstances concrete implementation will decrement the nodegroup size, and we can schedule a grouped refresh for the next loop iteration.
09ab1ea
to
037dc73
Compare
Thanks, that's better indeed! Updated accordingly. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bpineau, gjtempleton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…piles aws: Don't pile up successive full refreshes during AWS scaledowns
…piles aws: Don't pile up successive full refreshes during AWS scaledowns
…piles aws: Don't pile up successive full refreshes during AWS scaledowns
Force refreshing everything at every
DeleteNodes
calls causes slow downand throttling on large clusters with lots of ASGs and activity.
That function might be called many times in a row during scale-down.
Each time the forced refresh will re-discover all ASGs, all LaunchConfigurations,
then re-list all instances from discovered ASGs.
That immediate refresh isn't required anyway, as the cache's DeleteInstances
concrete implementation will decrement the nodegroup size, and we can
schedule a grouped refresh for the next loop iteration.
As a later step, we can consider splitting the asgCache.generate() function
to support per ASG refreshes (and maybe per ASG caches TTLs + jitter, to
spread API calls). But that should address the current issue for now.