Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with upstream v1.20.3 #130

Merged
merged 55 commits into from
Jun 25, 2022

Conversation

himanshu-kun
Copy link

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

Gardener autoscaler now in sync with upstream v1.20.3

k8s-ci-robot and others added 30 commits December 2, 2020 01:06
This commit fixes sample manifest of cluster-autoscaler clusterapi
provider.

(cherry picked from commit a5fee21)
…-release-1.20

Backport of kubernetes#3805: Fix cluster-autoscaler clusterapi sample manifest
…lps load balancer to remove the node from healthy hosts (ALB does have this support).

This won't fix the issue of 502 completely as there is some time node has to live even after cordoning as to serve In-Flight request but load balancer can be configured to remove Cordon nodes from healthy host list.
This feature is enabled by cordon-node-before-terminating flag with default value as false to retain existing behavior.
cherry pick kubernetes#3649 - Adding functionality to cordon the node before destroying it.
While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports
fetching 100 ASG per calls on all regions, matching what's documented:
https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html
```
     AutoScalingGroupNames.member.N
       The names of the Auto Scaling groups.
       By default, you can only specify up to 50 names.
       You can optionally increase this limit using the MaxRecords parameter.
     MaxRecords
       The maximum number of items to return with this call.
       The default value is 50 and the maximum value is 100.
```

Doubling this halves API calls on large clusters, which should help to prevent throttling.
Refactor to allow for optimisation
The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.

Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.
Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com>
…pick-of-#3999-kubernetes#4199-upstream-cluster-autoscaler-release-1.20

Automated cherry pick of kubernetes#3999 kubernetes#4127 kubernetes#4199 upstream cluster autoscaler release 1.20
Backport Merge pull request kubernetes#4274 to upstream/cluster-autoscaler-release-1.20
…s unready. Deprecated LongNotStarted

 In cases where node n1 would:
 1) Be created at t=0min
 2) Ready condition is true at t=2.5min
 3) Not ready taint is removed at t=3min
 the ready node is counted as unready

 Tested cases after fix:
 1) Case described above
 2) Nodes not starting even after 15mins still
 treated as unready
 3) Nodes created long ago that suddenly become unready are
 counted as unready.
Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
Cherry-pick kubernetes#4130: dont proactively decrement azure cache for unregistered nodes
…ick-of-#3924-upstream-cluster-autoscaler-release-1.20

Automated cherry pick of kubernetes#3924: Fix bug where a node that becomes ready after 2 mins can be
…ed to 1.20 in kubernetes#4319

The backport included unit tests using a function that changed signature
after 1.20. This was not detected before merging because CI is not
running correctly on 1.20.
Cluster Autoscaler: fix unit tests after kubernetes#3924 was backported to 1.20 in kubernetes#4319
Fix 1.20 build after Azure cherry-picks
ialidzhikov and others added 8 commits December 22, 2021 12:53
…migration enabled)

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
…pick-of-#4539-upstream-cluster-autoscaler-release-1.20

[release-1.20] Automated cherry pick of kubernetes#4539: Add `--feature-gates` flag to support scale up on volume
…r-release-1.20-aws-instance-update-02-06-2022

CA - AWS Cloud Provider - 1.20 Static Instance List Update 02-06-2022
…r-release-1.20.3

Cluster Autoscaler - 1.20.3 release
@himanshu-kun himanshu-kun requested review from hardikdr and a team as code owners June 24, 2022 04:56
@gardener-robot gardener-robot added needs/review Needs review size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else labels Jun 24, 2022
@CLAassistant
Copy link

CLAassistant commented Jun 24, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 15 committers have signed the CLA.

✅ ialidzhikov
✅ himanshu-kun
❌ k8s-ci-robot
❌ darkpssngr
❌ hidekazuna
❌ aidy
❌ bpineau
❌ atulaggarwal
❌ marwanad
❌ vivekbagade
❌ sylr
❌ towca
❌ gjtempleton
❌ sturman
❌ MaciekPytel
You have signed the CLA already but the status is still pending? Let us recheck it.

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 24, 2022
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 24, 2022
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 24, 2022
@himanshu-kun
Copy link
Author

/invite @ialidzhikov
The test-and-verify step is failing because of wrong boilerplate, which is because of syncing. Please ignore that error.

Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review needs/second-opinion Needs second review by someone else labels Jun 24, 2022
@himanshu-kun himanshu-kun merged commit e4c8f8d into gardener:rel-v1.20 Jun 25, 2022
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jun 25, 2022
@himanshu-kun himanshu-kun deleted the rel-v1.20-prep branch June 25, 2022 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.