forked from kubernetes/autoscaler
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with upstream 1.22 #121
Merged
himanshu-kun
merged 155 commits into
gardener:machine-controller-manager-provider
from
himanshu-kun:sync-with-upstream-1.22
Jun 1, 2022
Merged
Sync with upstream 1.22 #121
himanshu-kun
merged 155 commits into
gardener:machine-controller-manager-provider
from
himanshu-kun:sync-with-upstream-1.22
Jun 1, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change adds 4 metrics that can be used to monitor the minimum and maximum limits for CPU and memory, as well as the current counts in cores and bytes, respectively. The four metrics added are: * `cluster_autoscaler_cpu_limits_cores` * `cluster_autoscaler_cluster_cpu_current_cores` * `cluster_autoscaler_memory_limits_bytes` * `cluster_autoscaler_cluster_memory_current_bytes` This change also adds the `max_cores_total` metric to the metrics proposal doc, as it was previously not recorded there. User story: As a cluster autoscaler user, I would like to monitor my cluster through metrics to determine when the cluster is nearing its limits for cores and memory usage.
Now supported by magnum. https://review.opendev.org/c/openstack/magnum/+/737580/ If using node group autodiscovery, older versions of magnum will still forbid scaling to zero or setting the minimum node count to zero.
Force refreshing everything at every DeleteNodes calls causes slow down and throttling on large clusters with many ASGs (and lot of activity). That function might be called several times in a row during scale-down (once for each ASG having a node to be removed). Each time the forced refresh will re-discover all ASGs, all LaunchConfigurations, then re-list all instances from discovered ASGs. That immediate refresh isn't required anyway, as the cache's DeleteInstances concrete implementation will decrement the nodegroup size, and we can schedule a grouped refresh for the next loop iteration.
Sets the `kubernetes.io/arch` (and legacy `beta.kubernetes.io/arch`) to the proper instance architecture. While at it, re-gen the instance types list (adding new instance types that were missing)
The current implementation assumes MIG ids have the "https://content.googleapis.com" prefix, while the canonical id format seems to begin with "https://www.googleapis.com". Both formats work while talking to the GCE API, but the API returns the latter and other GCP services seem to assume it as well.
Cluster Autoscaler GCE: change the format of MIG id
…ycloud-provider cloudprovider: add Bizflycloud provider
Remove vivekbagade, add towca as an approver in cluster-autoscaler/OWNERS
Enable magnum provider scale to zero
…piles aws: Don't pile up successive full refreshes during AWS scaledowns
Release leader election lock on shutdown
FetchAllMigs (unfiltered InstanceGroupManagers.List) is costly as it's not bounded to MIGs attached to the current cluster, but rather lists all MIGs in the project/zone, and therefore equally affects all clusters in that project/zone. Running the calls concurrently over the region's zones (so at most, 4 concurrent API calls, about once per minute) contains that impact. findMigsInRegion might be scoped to the current cluster (name pattern), but also benefits from the same improvement, as it's also costly and called at each refreshInterval (1mn). Also: we're calling out GCE mig.Get() API again for each MIG (at ~300ms per API call, in my tests), sequentially and with the global cache lock held (when updateClusterState -> ...-> GetMigForInstance kicks in). Yet we already get that bit of information (MIG's basename) from any other mig.Get or mig.List call, like the one fetching target sizes. Leveraging this helps significantly on large fleets (for instance this shaves 8mn startup time on the large cluster I tested on).
…locatables support "/"separators in custom allocatable overrides via vmss tags
add stable zone labels in azure template generation
Document that TLS bootstrapping may be necessary for scale-up
Enable custom k8s fork in update-vendor.sh
Right now the file is breaking `go mod` commands.
gardener-robot-ci-1
added
needs/ok-to-test
Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
and removed
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
labels
May 24, 2022
gardener-robot-ci-3
added
the
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
label
May 26, 2022
gardener-robot-ci-1
removed
the
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
label
May 26, 2022
gardener-robot-ci-3
added
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
and removed
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
labels
May 27, 2022
himanshu-kun
force-pushed
the
sync-with-upstream-1.22
branch
from
May 27, 2022 08:42
f71030b
to
e6dc0af
Compare
gardener-robot-ci-3
added
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
and removed
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
labels
May 27, 2022
himanshu-kun
added
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
and removed
needs/ok-to-test
Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
labels
May 30, 2022
gardener-robot-ci-1
added
needs/ok-to-test
Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
and removed
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
labels
May 30, 2022
himanshu-kun
force-pushed
the
sync-with-upstream-1.22
branch
from
May 30, 2022 13:05
8487624
to
03d4e3d
Compare
gardener-robot-ci-2
added
the
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
label
May 30, 2022
gardener-robot-ci-1
removed
the
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
label
May 30, 2022
gardener-robot-ci-2
added
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
and removed
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
labels
May 30, 2022
gardener-robot-ci-2
added
the
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
label
May 30, 2022
gardener-robot-ci-1
removed
the
reviewed/ok-to-test
Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
label
May 30, 2022
ashwani2k
approved these changes
Jun 1, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @himanshu-kun for the PR.
Looks good to me.
gardener-robot
added
the
status/closed
Issue is closed (either delivered or triaged)
label
Jun 1, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
needs/ok-to-test
Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
needs/review
Needs review
needs/second-opinion
Needs second review by someone else
size/xl
Size of pull request is huge (see gardener-robot robot/bots/size.py)
status/closed
Issue is closed (either delivered or triaged)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #123
Special notes for your reviewer:
Release note: