Skip to content

Commit

Permalink
Sync with upstream v1.28.0 (#260)
Browse files Browse the repository at this point in the history
* feat(*): add more metrics

* Added the RBAC Permission to Linode.

* CA - Correct Cloudprovider PR labelling to area/provider/<provider name>

* fix(volcengine): don't build all provider when volcengine tag exist

* chore: replace `github.com/ghodss/yaml` with `sigs.k8s.io/yaml`

At the time of making this commit, the package `github.com/ghodss/yaml`
is no longer actively maintained.

`sigs.k8s.io/yaml` is a permanent fork of `ghodss/yaml` and is actively
maintained by Kubernetes SIG.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* Fixed Typo and Trailing-whitespace

* Skip healthiness check for non-existing similar node groups

* BinpackingLimiter interface

* fix comment and list format

* add more logging for balancing similar node groups

this change adds some logging at verbosity levels 2 and 3 to help
diagnose why the cluster-autoscaler does not consider 2 or more node
groups to be similar.

* Update VPA scripts to use v1.

* fix: don't clean `CriticalAddonsOnly` taint from template nodes
- this taint leads to unexpected behavior
- users expect CA to consider the taint when autoscaling

Signed-off-by: vadasambar <suraj.bankar@acquia.com>

* Updated the owners of civo cloudprovider

Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com>

* Bump golang from 1.20.4 to 1.20.5 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.4 to 1.20.5.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* cluster-autoscaler: support Brightbox image pattern

cluster-autoscaler/cloudprovider/brightbox

Allow scaled workers to be built from an image name pattern as well as
an image id. This deals with long running clusters where the official
image is updated with security changes over time.

* brightbox: set default docker registry

Only set the registry value in the local Makefile if it is missing

* Update oci-ip-cluster-autoscaler-w-config.yaml

* Update oci-ip-cluster-autoscaler-w-principals.yaml

* address comments

* golint fix

* Remove print condition for vpa-beta2-crd.

* Improvement: Modified the VPA content for the helm chart.

* Bump the Chart version to 9.29.1 and CA image to 1.27.2

* make no-op binpacking limiter as default + move mark nodegroups to its method

* Drop projected volumes for init containers

* fix zonal gce outage breaking CA when only some of the zones are failed

* Bump version to 0.14.0 as a preparation for release.

* Update vendor to Kubernetes 1.28.0-alpha.2

* Interface fixes after Kubernetes 1.28.0-alpha.2 vendor update

* Execute git commands to show the state of local clone of the repo.

* Clarify and simplify the "build and stage images" step.

* Mention logs from kubernetes#5862 in release instructions.

* addressed comments

* chore: remove unused func scaleFromZeroAnnotationsEnabled

Signed-off-by: Dinesh B <dineshba@microsoft.com>

* add cluster-autoscaler name and version to the user agent

This makes it easier to distinguish between various users
of the Go SDK.

* Explicitly create and remove buildx builders

* Apply fixes to in place support VPA AEP

Looks like the first PR merged a bit too early, while there were open coments

* Add voelzmo to VPA reviewers

voelzmo meets
[requirements](https://github.com/kubernetes/community/blob/9504ce87ec14cff9455e794fdcbc5088c52f9dd9/community-membership.md#requirements-1):

- K8s org members since 2023-02: kubernetes/org#4015
- Reviewer of 12 merged VPA PRs: https://github.com/kubernetes/autoscaler/pulls?q=is%3Apr+reviewed-by%3Avoelzmo+label%3Avertical-pod-autoscaler+is%3Amerged+
- Sent 10 merged VPA PRs: https://github.com/kubernetes/autoscaler/pulls?q=is%3Apr+label%3Avertical-pod-autoscaler+author%3Avoelzmo+is%3Amerged

* Bump default VPA version to 0.14.0

* Minor tweaks after preparing VPA 0.14.0 release.

* fix: CA on fargate causing log flood
- happens when CA tries to check if the unmanaged fargate node is a part of ASG (it isn't)
- and keeps on logging error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* test: fix node names
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* Sort nodegroups in order of their ID

* Move two util functions from actuator to delete_in_batch, where they are more appropriate

* Add support for atomic scale-down in node group options

* Extract cropNodesToBudgets function out of actuator file

* Support atomic scale-down option for node groups

* Respond to readability-related comments from the review

* Don't pass NodeGroup as a parameter to functions running asynchronously

* Add unit test for group_deletion_scheduler

* Use single AtomicScaling option for scale up and scale down

* address comments

* Address next set of comments

* update agnhost image to pull from registry.k8s.io

* Revert "Add subresource status for vpa"

This reverts commit 1384c8b.

* Bugfix for budget cropping

Previous "CropNodes" function of ScaleDownBudgetProcessor had an
assumption that atomically-scaled node groups should be classified as
"empty" or "drain" as a whole, however Cluster Autoscaler may classify
some of the nodes from a single group as "empty" and other as "drain".

* Remove unneeded node groups regardless of scale down being in cooldown.

* Update VPA vendor

Generated by runing:

```
go mod tidy
go mod vendor
```

* Replace `BuildTestContainer` with use of builder

* Quote temp folder name parameter to avoid errors

* Include short unregistered nodes in calculation of incorrect node group
sizes

* Add BigDarkClown to Cluster Autoscaler approvers

* Add support for scaling up ZeroToMaxNodesScaling node groups

* Use appropriate logging levels

* Remove unused field in expander and add comment about estimator

* Merge tests for ZeroToMaxNodesScaling into one table-driven test.

* Merged multiple tests into one single table driven test.
* Fixed some typos.

* Change handling of scale up options for ZeroToMaxNodeScaling in orchestrator

* Started handling scale up options for ZeroToMaxNodeScaling with the existing estimator
* Skip setting similar node groups for the node groups that use
  ZeroToMaxNodeScaling
* Renamed the autoscaling option from "AtomicScaleUp" to "AtomicScaling"
* Merged multiple tests into one single table driven test.
* Fixed some typos.

* Rename the autoscaling option

* Renamed the "AtomicScaling" autoscaling option to
  "ZeroOrMaxNodeScaling" to be more clear about the behavior.

* Record all vpa api versions in recommender metrics

Change the tracking of APIVersion from a boolean
indicating if the VPA is v1beta1
to the version string
and make sure it gets exported
in metrics.

Add tests for the recommender metrics.

* Add subresource status for vpa

Add status field in subresource on crd yaml and add new ClusterRole system:vpa-actor to patch /status subresource.
The `metadata.generation` only increase on vpa spec update.
Fix e2e test for patch and create vpa

* Implement threshold interface for use by threshold based limiter
Add EstimationContext to take into account runtime state of the autoscaling for estimations
Implement static threshold
Implement cluster capacity threshold for Estimation Limiter
Implement similar node groups capacity threshold for Estimation Limiter
Set default estimation thresholds

* Fix tests

* Add ClusterStateRegistry to the AutoscalingContext.

Due to the dependency of the MaxNodeProvisionTimeProvider on the context
the provider was extracted to a dedicated package and injected to the
ClusterStateRegistry after context creation.

* Make signature of GetDurationLimit uniformed with GetNodeLimit
For SNG threshold include capacity of the currently estimated node group (as it is not part of SNG itself)
Replaced direct calls with use of getters in cluster capacity threshold
Renamed getters removing the verb Get
Replace EstimationContext struct with interface
Add support for negative threshold value in estimation limiter

* Add support for negative binpacking duration limit in threshold based estimation limiter

* update RBAC to only use verbs that exist for the resources

Signed-off-by: Maximilian Rink <maximilian.rink@telekom.de>

* Move powerState to azure_util, change default to powerStateUnknown

* renames all PowerState* consts to vmPowerState*
* moves vmPowerState* consts and helper functions to azure_util.go
* changes default vmPowerState to vmPowerStateUnknown instead of vmPowerStateStopped when a power state is not set.

* test: fix failing tests
- remove non-relevant comment related to rescheduler
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* test: fix merge conflicts in actuator tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* refactor: use `actuatorNodeGroupConfigGetter` param in `NewActuator`
- instead of passing all the processors (we only need `NodeGroupConfigProcessor`)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* test: refactor eligibility tests
- add suffix to tests with `IgnoreDaemonSetsUtilization` set to `true` and `IgnoreDaemonSetsUtilization` set to `false`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* refactor: remove comment line (not relevant anymore)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* fix: dynamic assignment of the scale down threshold flags. Setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the larger of the two flags in the case both are set

* Refactor autoscaler.go and static_autoscalar.go to move declaration of the NodeDeletion option to main.go

* Fixed go:build tags for ovhcloud

* Update the go:build tag for missing cloud providers.

* Adapt FAQ for Pods without controller

* Use strings instead of NodeGroups as map keys in budgets.go

* Delete dead code from budgets.go

* Re-introduce asynchronous node deletion and clean node deletion logic.

* feat: support custom scheduler config for in-tree schedulr plugins (without extenders)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: `goto` causing infinite loop
- abstract out running extenders in a separate function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove code around extenders
- we decided not to use scheduler extenders for checking if a pod would fit on a node
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move scheduler config to a `utils/scheduler` package`
- use default config as a fallback
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix static_autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `GetSchedulerConfiguration` fn
- remove falling back
- add mechanism to detect if the scheduler config file flag was set
-
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: wip add tests for `GetSchedulerConfig`
- tests are failing now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `GetSchedulerConfig`
- abstract error messages so that we can use them in the tests
- set api version explicitly (this is what upstream does as well)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: do a round of cleanup to make PR ready for review
- make import names consistent
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: use `pflag` to check if the `--scheduler-config-file` flag was set
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add comments for exported error constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't export error messages
- exporting is not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: add underscore in test file name
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix test failing because of no comment on exported `SchedulerConfigFileFlag`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile`
- avoids confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add extra test cases for predicate checker
- where the predicate checker uses custom scheduler config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `setFlags` variable
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: abstract custom scheduler configs into `conifg` package
- make them constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix linting error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: introduce a new custom test predicate checker
- instead of adding a param to the current one
- this is so that we don't have to pass `nil` to the existing test predicate checker in many places
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig`
- latter narrows down meaning of the function better than former
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `GetSchedulerConfig` -> `ConfigFromPath`
- `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig`
- move test config to a new package `test` under `config` package
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add `TODO` for replacing code to parse scheduler config
- with upstream function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

* Use fixed version of golang image

* Fix TestBinpackingLimiter flake

* Bump golang from 1.20.5 to 1.20.6 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.5 to 1.20.6.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix: Do not inject fakeNode for instance which has errors on create

* chore: add script to update vendored hcloud-go

* chore(deps): update vendored hcloud-go to 2.0.0

Generated by:

```
UPSTREAM_REF=v2.0.0 hack/update-vendor.sh
```

* fix: balancer RBAC permission to update balancer status

* CA - AWS Cloudprovider OWNERS Update

* Enable parallel drain by default.

* Add BigDarkClown to patch releases schedule

* Update Cluster Autoscaler vendor to K8s 1.28.0-beta.0

* Add EstimationAnalyserFunc to be run at the end of the estimation logic

* Remove ChangeRequirements with `OrEqual`

* Add EvictionRequirements to types

* Run `generate-crd-yaml.sh`

* Add metrics for improved observability:
* pending_node_deletions
* failed_gpu_scale_ups_total

* Add requirement for Custom Resources to VPA FAQ

* Clarify Eviction Control for Pods with multiple Containers

* Fix broken hyperlink

Co-authored-by: Shubham <shubham.kuchhal@india.nec.com>

* Update vertical-pod-autoscaler/FAQ.md

Co-authored-by: Joachim <jbartosik@google.com>

* Update vertical-pod-autoscaler/FAQ.md

Co-authored-by: Joachim <jbartosik@google.com>

* Reword AND/OR combinations for more clarity

* Fix nil pointer exception for case when node is nil while processing gpuInfo

* feat: add prometheus basic auth

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>

* Add error code for invalid reservations to GCE client

* Bump golang from 1.20.6 to 1.20.7 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.6 to 1.20.7.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Support ZeroOrMaxNodeScaling node groups when cleaning up unregistered nodes

* Don't pass nil nodes to GetGpuInfoForMetrics

* Revert "Fix nil pointer exception for case when node is nil while processing …"

* Clean up NodeGroupConfigProcessor interface

* docs: add kep to add fswatcher to nanny for automatic nanny configuration

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>

* Allow using an external secret instead of using the one the Helm chart creates

* Remove the MaxNodeProvisioningTimeProvider interface

* Fixed the hyperlink for Node group auto discovery.

* Update ResourcePolicy description and limit control README

* s390x image support

* Bump golang from 1.20.7 to 1.21.0 in /vertical-pod-autoscaler/builder

Bumps golang from 1.20.7 to 1.21.0.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* test

* Set batch size to target size for atomically scaled groups

* a little extra validation

* test with 2 atomic groups

* don't block draining other groups when one group has some empty nodes

* fix: Broken links to testgrid dashboard

* fix: scale down broken for providers not implementing NodeGroup.GetOptions()

* feat(hetzner): use less requests while waiting for server create

The default is to send a new request every 500ms, this will instead use
an exponential backoff while waiting for the server the create.

* Update in-place updates AEP adding details to consider

* Fix Doc with External gRPC

Signed-off-by: ZhengSheng0524 <j13tw@yahoo.com.tw>

* Add fetch reservations in specific project

GCE supports shared reservations where the reservation is in a different
project than the project the cluster is in. Add GCE client method to get
said reservations so autoscaling can support shared reservations.

* kep: add config file format and structure notes

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>

* CA - 1.28.0 k/k Vendor Update

* Fix duplicate imports in IT

* re-add changes part of FORK-CHANGE

* Re added a fork change command and updated sync change notes

* Update cluster-autoscaler/SYNC-CHANGES/SYNC_CHANGES-1.28.md

Co-authored-by: Rishabh Patel <66425093+rishabh-11@users.noreply.github.com>

---------

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Signed-off-by: vadasambar <suraj.bankar@acquia.com>
Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Dinesh B <dineshba@microsoft.com>
Signed-off-by: Maximilian Rink <maximilian.rink@telekom.de>
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: ZhengSheng0524 <j13tw@yahoo.com.tw>
Co-authored-by: qianlei.qianl <qianlei.qianl@bytedance.com>
Co-authored-by: shubham82 <shubham.kuchhal@india.nec.com>
Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com>
Co-authored-by: Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com>
Co-authored-by: Eng Zer Jun <engzerjun@gmail.com>
Co-authored-by: Bartłomiej Wróblewski <bwroblewski@google.com>
Co-authored-by: Kushagra <kushu@google.com>
Co-authored-by: kei-gnu <61653118+kei-gnu@users.noreply.github.com>
Co-authored-by: michael mccune <msm@opbstudios.com>
Co-authored-by: vadasambar <suraj.bankar@acquia.com>
Co-authored-by: Vishal Anarse <vishalanarse11@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Neil Wilson <neil@aldur.co.uk>
Co-authored-by: Sourabh Gupta <gupta.sourabh385@gmail.com>
Co-authored-by: Artur Żyliński <azylinski@google.com>
Co-authored-by: Damika Gamlath <damika@google.com>
Co-authored-by: Karol Golab <kgolab@google.com>
Co-authored-by: Dinesh B <dineshba@microsoft.com>
Co-authored-by: Todd Neal <tnealt@amazon.com>
Co-authored-by: Marco Voelz <marco.voelz@sap.com>
Co-authored-by: Joachim Bartosik <jbartosik@google.com>
Co-authored-by: vadasambar <surajrbanakar@gmail.com>
Co-authored-by: Karol Wychowaniec <kawych@google.com>
Co-authored-by: Kevin Wiesmueller <kwiesmueller@google.com>
Co-authored-by: Aleksandra Gacek <algacek@google.com>
Co-authored-by: Hakan Bostan <hbostan@google.com>
Co-authored-by: xiaoqing <xiaoqingnb@gmail.com>
Co-authored-by: Yuriy Stryuchkov <43517125+ystryuchkov@users.noreply.github.com>
Co-authored-by: Daniel Gutowski <danielgutowski@google.com>
Co-authored-by: Maximilian Rink <maximilian.rink@telekom.de>
Co-authored-by: dom.bozzuto <dom.bozzuto@datadoghq.com>
Co-authored-by: bsoghigian <bsoghigian@microsoft.com>
Co-authored-by: Krzysztof Siedlecki <ksiedlecki@google.com>
Co-authored-by: Julian Tölle <julian.toelle@hetzner-cloud.de>
Co-authored-by: Amir Alavi <amiralavi7@gmail.com>
Co-authored-by: Daniel Kłobuszewski <danielmk@google.com>
Co-authored-by: droctothorpe <alexander.perlman@capitalone.com>
Co-authored-by: Marco Voelz <voelzmo@users.noreply.github.com>
Co-authored-by: Jayant Jain <jayantj@google.com>
Co-authored-by: AhmedGrati <ahmedgrati1999@gmail.com>
Co-authored-by: Mike Tougeron <tougeron@adobe.com>
Co-authored-by: Sachin Tiptur <sachin.tiptur@ionos.com>
Co-authored-by: Saripalli Lavanya <Saripalli.Lavanya@ibm.com>
Co-authored-by: Aleksandra Malinowska <aleksandram@google.com>
Co-authored-by: Yash Khare <yash2010118@akgec.ac.in>
Co-authored-by: Piotr Betkier <pbetkier@gmail.com>
Co-authored-by: ZhengSheng0524 <j13tw@yahoo.com.tw>
Co-authored-by: Jessica Chen <jesschen@google.com>
Co-authored-by: Rishabh Patel <66425093+rishabh-11@users.noreply.github.com>
  • Loading branch information
Show file tree
Hide file tree
Showing 1,842 changed files with 134,451 additions and 47,687 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# KEP-5546: Scaling based on container count

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

## Summary

Currently Addon Resizer supports scaling based on the number of nodes. Some workloads use resources proportionally to
the number of containers in the cluster. Since number of containers per node is very different in different clusters
it's more resource-efficient to scale such workloads based directly on the container count.

### Goals

- Allow scaling workloads based on count of containers in a cluster.
- Allow this for Addon Resizer 1.8 ([used by metrics server]).

### Non-Goals

- Using both node and container count to scale workloads.
- Bringing this change to the `master` branch of Addon Resizer.

## Proposal

Add flag `--scaling-mode` to Addon Resizer on the [`addon-resizer-release-1.8`] branch. Flag will
have two valid values:

- `node-proportional` - default, current behavior.
- `container-proportional` - addon resizer will set resources, using the same algorithm it's using now but using number
of containers where it's currently using number of nodes.

### Notes

Addon Resizer 1.8 assumes in multiple places that it's scaling based on the number of nodes:

- [Flag descriptions] that directly reference node counts (`--extra-cpu`, `--extra-memory`, `--extra-storage`, and
`--minClusterSize`) will need to be updated to instead refer to cluster size.
- [README] will need to be updated to reference cluster size instead of node count and explain that cluster size refers
to either node count or container count, depending on the value of the `--scaling-mode` flag.
- Many variable names in code which now refer to node count will refer to cluster size and should be renamed accordingly.

In addition to implementing the feature we should also clean up the code and documentation.

### Risks and Mitigations

One potential risk is that Addon resizer can obtain cluster size (node count or container count):
- from metrics or
- by querying Cluster Api Server to list all objects of the appropriate type

depending on the configuration. There can be many times more containers in a cluster that there are nodes. So listing
all containers could result in higher load on the Cluster API server. Since Addon Resizer is requesting very few fields
I don't expect this effect to be noticeable.

Also I expect metrics-server to test for this before using the feature and any other users of Addon Resizer are likely
better off using metrics (which don't have this problem).

## Design Details

- Implement function `kubernetesClient.CountContainers()`. It will be analogous to the existing
[`kubernetesClient.CountNodes()`] function.
- If using metrics to determine number of containers in the cluster:
- Fetch pod metrics (similar to [fetching node metrics] but use `/pods` URI instead of `/nodes`).
- For each pod obtain number of containers (length of the `containers` field).
- Sum container counts for all pods.
- If using API server:
- Fetch list pods (similar to [listing nodes])
- Fetch only [`Spec.InitContainers`], [`Spec.Containers`], and [`Spec.EphemeralContainers`] fields.
- Exclude pods in terminal states ([selector excluding pods in terminal states in VPA])
- Sum container count over pods.
- Add the `--scaling-mode` flag, with two valid values:
- `node-proportional` - default, current behavior, scaling based on clusters node count and
- `container-proportional` - new behavior, scaling based on clusters container count
- Pass value indicating if we should use node count or container count to the [`updateResources()`] function.
- In `updateResources()` use node count or container count, depending on the value.

Check that listing containers directly works

Coinsider listing pods, getting containers only for working pods

### Test Plan

In addition to unit tests we will run manual e2e test:

- Create config based on [`example.yaml`] but scaling the deployment based on the number of containers in the cluster.
- Create config starting deployment with 100 `pause` containers.

Test the feature by:

- Starting the deployment scaled by Addon Resizer, based on node count.
- Observe size of the deployment and that it's stable.
- Start deployment with 100 `pause` containers.
- Observe the scaled deployment change resources appropriately.

Test the node-based scaling:

- Apply [`example.yaml`].
- Observe amount and stability assigned resources.
- Resize cluster.
- Observe change in assigned resources.

Both tests should be performed with metrics- and API- based scaling.

[used by metrics server]: https://github.com/kubernetes-sigs/metrics-server/blob/0c47555e9b49cfe0719db1a0b7fb6c8dcdff3d38/charts/metrics-server/values.yaml#L121
[`addon-resizer-release-1.8`]: https://github.com/kubernetes/autoscaler/tree/addon-resizer-release-1.8
[Flag descriptions]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/main/pod_nanny.go#L47
[README]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/README.md?plain=1#L1
[`kubernetesClient.CountNodes()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L58
[fetching node metrics]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L150
[listing nodes]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L71
[`Spec.InitContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3143
[`Spec.Containers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3150
[`Spec.EphemeralContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3158
[`Status.Phase`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L4011
[selector excluding pods in terminal states in VPA]: https://github.com/kubernetes/autoscaler/blob/04e5bfc88363b4af9fdeb9dfd06c362ec5831f51/vertical-pod-autoscaler/e2e/v1beta2/common.go#L195
[`updateResources()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/nanny_lib.go#L126
[`example.yaml`]: https://github.com/kubernetes/autoscaler/blob/c8d612725c4f186d5de205ed0114f21540a8ed39/addon-resizer/deploy/example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# KEP-5546: Automatic reload of nanny configuration when updated

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

Sure, here's the enhancement proposal in the requested format:

## Summary
- **Goals:** The goal of this enhancement is to improve the user experience for applying nanny configuration changes in the addon-resizer 1.8 when used with the metrics server. The proposed solution involves automatically reloading the nanny configuration whenever changes occur, eliminating the need for manual intervention and sidecar containers.
- **Non-Goals:** This proposal does not aim to update the functional behavior of the addon-resizer.

## Proposal
The proposed solution involves updating the addon-resizer with the following steps:
- Create a file system watcher using `fsnotify` under `utils/fswatcher` to watch nanny configurations' changes. It should run as a goroutine in the background.
- Detect changes of the nanny configurations' file using the created `fswatcher` trigger the reloading process when configuration changes are detected. Events should be sent in a channel.
- Re-execute the method responsible for building the NannyConfiguration `loadNannyConfiguration` to apply the updated configuration to the addon-resizer.
- Proper error handling should be implemented to manage scenarios where the configuration file is temporarily inaccessible or if there are parsing errors in the configuration file.

### Risks and Mitigations
- There is a potential risk of filesystem-related issues causing the file watcher to malfunction. Proper testing and error handling should be implemented to handle such scenarios gracefully.
- Errors in the configuration file could lead to unexpected behavior or crashes. The addon-resizer should handle parsing errors and fall back to the previous working configuration if necessary.

## Design Details
- Create a new package for the `fswatcher` under `utils/fswatcher`. It would contain the `fswatcher` struct and methods and unit-tests.
- `FsWatcher` struct would look similar to this:
```go
type FsWatcher struct {
*fsnotify.Watcher

Events chan struct{}
ratelimit time.Duration
names []string
paths map[string]struct{}
}
```
- Implement the following functions:
- `CreateFsWatcher`: Instantiates a new `FsWatcher` and start watching on file system.
- `initWatcher`: Initializes the `fsnotify` watcher and initialize the `paths` that would be watched.
- `add`: Adds a new file to watch.
- `reset`: Re-initializes the `FsWatcher`.
- `watch`: watches for the configured files.
- In the main function, we create a new `FsWatcher` and then we wait in an infinite loop to receive events indicating
filesystem changes. Based on these changes, we re-execute `loadNannyConfiguration` function.

> **Note:** The expected configuration file format is YAML. It has the same structure as the NannyConfiguration CRD.

### Test Plan
To ensure the proper functioning of the enhanced addon-resizer, the following test plan should be executed:
1. **Unit Tests:** Write unit tests to validate the file watcher's functionality and ensure it triggers events when the configuration file changes.
2. **Manual e2e Tests:** Deploy the addon-resizer with `BaseMemory` of `300Mi` and then we change the `BaseMemory` to `100Mi`. We should observer changes in the behavior of watched pod.
[fsnotify]: https://github.com/fsnotify/fsnotify
6 changes: 6 additions & 0 deletions balancer/deploy/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ rules:
- watch
- patch
- update
- apiGroups:
- balancer.x-k8s.io
resources:
- balancers/status
verbs:
- update
- apiGroups:
- ""
resources:
Expand Down
4 changes: 2 additions & 2 deletions balancer/proposals/balancer.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ These domains may include:
* Cloud provider zones inside a single region, to ensure that the application is still up and running, even if one of the zones has issues.
* Different types of Kubernetes nodes. These may involve nodes that are spot/preemptible, or of different machine families.

A single Kuberentes deployment may either leave the placement entirely up to the scheduler
A single Kubernetes deployment may either leave the placement entirely up to the scheduler
(most likely leading to something not entirely desired, like all pods going to a single domain) or
focus on a single domain (thus not achieving the goal of being in two or more domains).

Expand Down Expand Up @@ -179,4 +179,4 @@ type BalancerStatus struct {
// +patchStrategy=merge
Conditions []metav1.Condition
}
```
```
2 changes: 1 addition & 1 deletion builder/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

FROM golang:1.20
FROM golang:1.20.4
LABEL maintainer="Marcin Wielgus <mwielgus@google.com>"

ENV GOPATH /gopath/
Expand Down
4 changes: 2 additions & 2 deletions charts/cluster-autoscaler/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
apiVersion: v2
appVersion: 1.26.2
appVersion: 1.27.2
description: Scales Kubernetes worker nodes within autoscaling groups.
engine: gotpl
home: https://github.com/kubernetes/autoscaler
Expand All @@ -11,4 +11,4 @@ name: cluster-autoscaler
sources:
- https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
type: application
version: 9.28.0
version: 9.29.2
8 changes: 6 additions & 2 deletions charts/cluster-autoscaler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,10 @@ Though enough for the majority of installations, the default PodSecurityPolicy _
### VerticalPodAutoscaler
The chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) for the Deployment if needed. A VPA can help minimize wasted resources when usage spikes periodically or remediate containers that are being OOMKilled.
The CA Helm Chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) object from Chart version `9.27.0`
onwards for the Cluster Autoscaler Deployment to scale the CA as appropriate, but for that, we
need to install the VPA to the cluster separately. A VPA can help minimize wasted resources
when usage spikes periodically or remediate containers that are being OOMKilled.
The following example snippet can be used to install VPA that allows scaling down from the default recommendations of the deployment template:
Expand Down Expand Up @@ -383,7 +386,7 @@ vpa:
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy |
| image.pullSecrets | list | `[]` | Image pull secrets |
| image.repository | string | `"registry.k8s.io/autoscaling/cluster-autoscaler"` | Image repository |
| image.tag | string | `"v1.26.2"` | Image tag |
| image.tag | string | `"v1.27.2"` | Image tag |
| kubeTargetVersionOverride | string | `""` | Allow overriding the `.Capabilities.KubeVersion.GitVersion` check. Useful for `helm template` commands. |
| magnumCABundlePath | string | `"/etc/kubernetes/ca-bundle.crt"` | Path to the host's CA bundle, from `ca-file` in the cloud-config file. |
| magnumClusterName | string | `""` | Cluster name or ID in Magnum. Required if `cloudProvider=magnum` and not setting `autoDiscovery.clusterName`. |
Expand All @@ -408,6 +411,7 @@ vpa:
| rbac.serviceAccount.name | string | `""` | The name of the ServiceAccount to use. If not set and create is `true`, a name is generated using the fullname template. |
| replicaCount | int | `1` | Desired number of pods |
| resources | object | `{}` | Pod resource requests and limits. |
| secretKeyRefNameOverride | string | `""` | Overrides the name of the Secret to use when loading the secretKeyRef for AWS and Azure env variables |
| securityContext | object | `{}` | [Security context for pod](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) |
| service.annotations | object | `{}` | Annotations to add to service |
| service.create | bool | `true` | If `true`, a Service will be created. |
Expand Down
5 changes: 4 additions & 1 deletion charts/cluster-autoscaler/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,10 @@ Though enough for the majority of installations, the default PodSecurityPolicy _

### VerticalPodAutoscaler

The chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) for the Deployment if needed. A VPA can help minimize wasted resources when usage spikes periodically or remediate containers that are being OOMKilled.
The CA Helm Chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) object from Chart version `9.27.0`
onwards for the Cluster Autoscaler Deployment to scale the CA as appropriate, but for that, we
need to install the VPA to the cluster separately. A VPA can help minimize wasted resources
when usage spikes periodically or remediate containers that are being OOMKilled.

The following example snippet can be used to install VPA that allows scaling down from the default recommendations of the deployment template:

Expand Down
11 changes: 10 additions & 1 deletion charts/cluster-autoscaler/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -151,13 +151,22 @@ rules:
- cluster.x-k8s.io
resources:
- machinedeployments
- machinedeployments/scale
- machinepools
- machines
- machinesets
verbs:
- get
- list
- update
- watch
- apiGroups:
- cluster.x-k8s.io
resources:
- machinedeployments/scale
- machinepools/scale
verbs:
- get
- patch
- update
{{- end }}
{{- end -}}
20 changes: 10 additions & 10 deletions charts/cluster-autoscaler/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,36 +132,36 @@ spec:
valueFrom:
secretKeyRef:
key: AwsAccessKeyId
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- if .Values.awsSecretAccessKey }}
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AwsSecretAccessKey
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- else if eq .Values.cloudProvider "azure" }}
- name: ARM_SUBSCRIPTION_ID
valueFrom:
secretKeyRef:
key: SubscriptionID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_RESOURCE_GROUP
valueFrom:
secretKeyRef:
key: ResourceGroup
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_VM_TYPE
valueFrom:
secretKeyRef:
key: VMType
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: AZURE_CLUSTER_NAME
valueFrom:
secretKeyRef:
key: ClusterName
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- if .Values.azureUseWorkloadIdentityExtension }}
- name: ARM_USE_WORKLOAD_IDENTITY_EXTENSION
value: "true"
Expand All @@ -173,22 +173,22 @@ spec:
valueFrom:
secretKeyRef:
key: TenantID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_CLIENT_ID
valueFrom:
secretKeyRef:
key: ClientID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: ClientSecret
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: AZURE_NODE_RESOURCE_GROUP
valueFrom:
secretKeyRef:
key: NodeResourceGroup
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.extraEnv }}
Expand Down
Loading

0 comments on commit 0ebbdfb

Please sign in to comment.