remove safeToStartCycle check #91

MinyiZ · 2025-01-20T05:22:49Z

This section of code does not behave as intended. It was introduced in 2020 to attempt to fix a race condition between the observer and cluster autoscaler. Due to recent improvements, the race condition is now solved so this code can be removed.

Why does this code not work?
From https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/metrics.md#cluster-autoscaler-execution

last_activity records last time certain part of cluster autoscaler logic executed. Represented with unix timestamp. autoscaler-activity values are:

main - main loop iteration started.

autoscaling - current state of the cluster has been updated, started autoscaling logic.

scaleUp - autoscaler will check if scale up is necessary.

scaleDown - autoscaler will try to scale down some nodes.

The last_activity metric reflects whether cluster autoscaler has attempted to scale up or down. However, it does not necessarily indicate actual changes in the ASG, resulting in numerous false positives. This behaviour can unnecessarily block the creation of CNRs, even when it is safe to proceed with cycling.

mwhittington21 · 2025-01-20T08:19:38Z

cmd/observer/main.go

-		nodeStartupTime:          rootCmd.PersistentFlags().Duration("node-startup-time", 2*time.Minute, "duration to wait after a cluster-autoscaler scaleUp event is detected"),
-		runImmediately:           rootCmd.PersistentFlags().Bool("now", false, "makes the check loop run straight away on program start rather than wait for the check interval to elapse"),
-		runOnce:                  rootCmd.PersistentFlags().Bool("once", false, "run the check loop once then exit. also works with --now"),
-		prometheusAddress:        rootCmd.PersistentFlags().String("prometheus-address", "prometheus", "Prometheus service address used to query cluster-autoscaler metrics"),


discuss: is it worth allowing --prometheus-address to remain as a flags but be unused, in order to avoid erroring out people's installations when they automatically bump the version, or is it better for the version bump to immediately show this feature isn't used any more by this error message stopping it starting?

docker run --rm -it ghcr.io/atlassian-labs/cyclops:v1.10.1 cyclops --prometheus-address 127.0.0.1:8080 cyclops: error: unknown long flag '--prometheus-address', try --help

i think i prefer to remove the flag immediately because having a flag that does nothing can be confusing. leaving it would also require us to remember to remove it later. since this is already a breaking change, i think it's better to alert users to the change in behaviour right away with the flag removal, rather than silently ignoring it.

vincentportella

lgtm

MinyiZ requested review from awprice, vincentportella and mwhittington21 January 20, 2025 05:23

MinyiZ changed the title ~~KUBE-9199: remove safeToStartCycle check~~ remove safeToStartCycle check Jan 20, 2025

MinyiZ force-pushed the mzhong/KUBE-9199-remove-prom-metrics-check-for-safetocycle branch 4 times, most recently from c95603b to 92b795b Compare January 20, 2025 05:37

Minyi Zhong added 2 commits January 20, 2025 16:56

remove safeToStartCycle check

00efeaa

fix linter error

2d93170

MinyiZ force-pushed the mzhong/KUBE-9199-remove-prom-metrics-check-for-safetocycle branch from 424f027 to 2d93170 Compare January 20, 2025 05:56

update golangci-lint.yml

d11fd2b

mwhittington21 reviewed Jan 20, 2025

View reviewed changes

mwhittington21 approved these changes Jan 20, 2025

View reviewed changes

vincentportella approved these changes Jan 20, 2025

View reviewed changes

MinyiZ merged commit 913fc99 into master Jan 20, 2025
3 checks passed

MinyiZ mentioned this pull request Jan 22, 2025

remove unused commandline flags from observer #92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove safeToStartCycle check #91

remove safeToStartCycle check #91

MinyiZ commented Jan 20, 2025 •

edited

Loading

mwhittington21 Jan 20, 2025

MinyiZ Jan 20, 2025

vincentportella left a comment

remove safeToStartCycle check #91

remove safeToStartCycle check #91

Conversation

MinyiZ commented Jan 20, 2025 • edited Loading

mwhittington21 Jan 20, 2025

Choose a reason for hiding this comment

MinyiZ Jan 20, 2025

Choose a reason for hiding this comment

vincentportella left a comment

Choose a reason for hiding this comment

MinyiZ commented Jan 20, 2025 •

edited

Loading