Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add separate concurrency flag for cluster cache tracker #9116

Conversation

chrischdi
Copy link
Member

What this PR does / why we need it:

Adds a separate flag --clustercache-tracker-concurrency to allow adjusting the concurrency of the created cluster cache tracker to all relevant executables: CAPI, CAPBK, KCP, CAPD.

This came up while adding the clustercache tracker to CAPV in this discussion.

Before this PR the other concurrency flags got re-used to also set the concurrency for the clustercache tracker.

Example from @sbueringer why this change makes sense:

Let's make an example. You have a mgmt cluster with 2k+ workload clusters. The cluster controller is doing some heavy lifting so you would want to use more workers (for 2k 10 should be still fine but it's just an example :D).

So if you want to increase the concurrency of the cluster controller you don't want the side effect that the cluster controller of the cluster cache tracker also gets more workers.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 3, 2023
@sbueringer
Copy link
Member

cc @vincepri @killianmuldoon

@chrischdi chrischdi force-pushed the pr-separate-concurrency-cc-tracker branch from 022f1fb to 6700138 Compare August 3, 2023 15:37
@chrischdi
Copy link
Member Author

With marking it as deprecated, the old flag won't be visible anymore on -h. Defining it does not prevent the controller to run, instead the following is printed at the start:

❯ go run ./bootstrap/kubeadm --cluster-concurrency 1
Flag --cluster-concurrency has been deprecated, This flag has no function anymore and is going to be removed in a next release. Use "--clustercachetracker-concurrency" insteaed.
I0804 08:32:11.242317   50353 listener.go:44] "controller-runtime/metrics: Metrics server is starting to listen" addr="localhost:8080"
I0804 08:32:11.248101   50353 webhook.go:158] "controller-runtime/builder: Registering a mutating webhook" GVK="bootstrap.cluster.x-k8s.io/v1beta1, Kind=KubeadmConfig" path="/mutate-bootstrap-cluster-x-k8s-io-v1beta1-kubeadmconfig"
...

Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit/question, otherwise looks good to me, thanks

bootstrap/kubeadm/main.go Outdated Show resolved Hide resolved
main.go Show resolved Hide resolved
@chrischdi chrischdi force-pushed the pr-separate-concurrency-cc-tracker branch from 6700138 to cbf78d4 Compare August 4, 2023 07:22
@chrischdi
Copy link
Member Author

chrischdi commented Aug 4, 2023

One nit/question, otherwise looks good to me, thanks

Sorry about forgetting to push my commit 🤦 ☕

@chrischdi chrischdi force-pushed the pr-separate-concurrency-cc-tracker branch from cbf78d4 to fb958c0 Compare August 4, 2023 07:24
@sbueringer
Copy link
Member

Restarted the linter

@chrischdi chrischdi force-pushed the pr-separate-concurrency-cc-tracker branch from fb958c0 to 807e0eb Compare August 4, 2023 15:43
@sbueringer
Copy link
Member

sbueringer commented Aug 4, 2023

/lgtm
/approve

/hold
Let's give folks a bit more time to review. I would merge on Monday as written in Slack

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 4, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5d527676520673ea1c4a4f289c53fd2cdfe6108d

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2023
Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@killianmuldoon
Copy link
Contributor

/area clustercachetracker

@k8s-ci-robot k8s-ci-robot added the area/clustercachetracker Issues or PRs related to the clustercachetracker label Aug 4, 2023
Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sbueringer
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2023
@k8s-ci-robot k8s-ci-robot merged commit 3c5b047 into kubernetes-sigs:main Aug 7, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.6 milestone Aug 7, 2023
@@ -134,6 +135,9 @@ func InitFlags(fs *pflag.FlagSet) {
fs.IntVar(&kubeadmControlPlaneConcurrency, "kubeadmcontrolplane-concurrency", 10,
"Number of kubeadm control planes to process simultaneously")

fs.IntVar(&clusterCacheTrackerConcurrency, "clustercachetracker-concurrency", 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clustercachetracker is an implementation detail that's now being exposed to users, could we find a name like child-cluster-concurrency or workload-cluster-concurrency?

Copy link
Member

@sbueringer sbueringer Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of a better name that expresses what we do :). So I thought it's a good idea to use something that expresses ~ the name of the controller.

We can use something like workload-cluster-concurrency but it seems misleading to be honest. It's not actually the concurrency of how many workload clusters we can reconcile. Also all our clusters are workload clusters.

Is something like clustercache-concurrency better? It doesn't point directly to CCT but it still expresses somewhat that it's the concurrency of the cache we use for clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri wdyt about dropping the flag and just hard-coding to 10? 10 should be enough anyway so there is no need for a flag.

I just wanted to avoid that the cct is using a lot of workers if someone needs more workers for the regular cluster reconcilers

@@ -135,6 +136,9 @@ func initFlags(fs *pflag.FlagSet) {
fs.IntVar(&concurrency, "concurrency", 10,
"The number of docker machines to process simultaneously")

fs.IntVar(&clusterCacheTrackerConcurrency, "clustercachetracker-concurrency", 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't set, could we actually default to concurrency? It makes some sense to keep these in sync

Copy link
Member

@sbueringer sbueringer Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think usually you wouldn't use the same number of workers. Usually you need more workers for a controller when they otherwise can't keep up with the load. This mainly depends on reconcile duration.

Looking at the Reconcile func of ClusterCacheReconciler, it will always return almost instantly. So even with >10k Clusters I don't think we need more than 10 workers. (With 1 ms reconcile duration, 10 workers can reconcile 10k clusters in 1s)

In fact the only reconciler that I had to run with more than 10 workers with 2k clusters was KubeadmControlPlane (because it had reconcile durations of ~ 1-2 seconds). In that case I used 50 workers.

So I think while we would have to increase the concurrency of the Cluster controllers at some point to above 10. The ClusterCacheReconciler would still be fine with 10

@chrischdi chrischdi deleted the pr-separate-concurrency-cc-tracker branch August 8, 2023 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clustercachetracker Issues or PRs related to the clustercachetracker cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants