✨ Add metrics for total workers and active workers #1125

alvaroaleman · 2020-08-11T21:45:45Z

Helps debugging issues around reconciliation duration. Right now its impossible to find out the number of workers from metrics and its also impossible to find out the number of active workers. The only thing we have is workqueue_unfinished_work_seconds which if big is probably not good but its not clearly indicating if we have one worker that takes longer, if all workers are blocked etc.

k8s-ci-robot · 2020-08-11T21:45:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alvaroaleman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alvaroaleman · 2020-08-11T21:46:08Z

/assign @estroz

alvaroaleman · 2020-08-11T22:00:03Z

/retest

alvaroaleman · 2020-08-11T22:11:14Z

/retest

estroz · 2020-08-11T22:52:55Z

pkg/internal/controller/controller.go

@@ -169,6 +169,7 @@ func (c *Controller) Start(stop <-chan struct{}) error {

 		// Launch workers to process resources
 		c.Log.Info("Starting workers", "worker count", c.MaxConcurrentReconciles)
+		ctrlmetrics.WorkerCount.WithLabelValues(c.Name).Set(float64(c.MaxConcurrentReconciles))


Can you explain the logic behind collecting this metric?

It allows correlating other metrics to the number of workers, for example workqueue_depth

It allows calculating the available_workers

alvaroaleman · 2020-08-17T22:23:06Z

@estroz PTAL

estroz · 2020-08-18T00:58:00Z

Without knowing too much about how metrics are collected this looks fine to me. However there are now 7 metrics collected for a controller. Will an increase in the number of metrics impact cluster resources significantly?

/lgtm

estroz · 2020-08-18T00:58:25Z

Cancel whenever

/hold

alvaroaleman · 2020-08-18T13:04:44Z

Without knowing too much about how metrics are collected this looks fine to me. However there are now 7 metrics collected for a controller. Will an increase in the number of metrics impact cluster resources significantly?

No, the way this works is that only the current value of a metric is stored and Prometheus is expected to collect it. The thing to keep in mind is to not expose too many metrics which usually happens when using labels that have a yet-to-be-determined set of values, because every metric+label combinaation actually ends up being an unique metric in Prometheus: https://www.robustperception.io/cardinality-is-key

/hold cancel

alvaroaleman · 2020-08-18T13:48:35Z

/retest

alvaroaleman · 2020-08-18T14:29:30Z

/retest

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 11, 2020

k8s-ci-robot requested review from gerred and pwittrock August 11, 2020 21:45

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 11, 2020

k8s-ci-robot assigned estroz Aug 11, 2020

✨ Add metrics for total workers and active workers

5cefa42

alvaroaleman force-pushed the moar-metrics branch from 33ce5b6 to 5cefa42 Compare August 11, 2020 21:48

alvaroaleman changed the title ~~✨ Add metrics for total workers and available workers~~ ✨ Add metrics for total workers and active workers Aug 11, 2020

estroz reviewed Aug 11, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 18, 2020

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 18, 2020

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 18, 2020

k8s-ci-robot merged commit af7f192 into kubernetes-sigs:master Aug 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add metrics for total workers and active workers #1125

✨ Add metrics for total workers and active workers #1125

alvaroaleman commented Aug 11, 2020 •

edited

Loading

k8s-ci-robot commented Aug 11, 2020

alvaroaleman commented Aug 11, 2020

alvaroaleman commented Aug 11, 2020

alvaroaleman commented Aug 11, 2020

estroz Aug 11, 2020

alvaroaleman Aug 11, 2020

alvaroaleman commented Aug 17, 2020

estroz commented Aug 18, 2020

estroz commented Aug 18, 2020

alvaroaleman commented Aug 18, 2020

alvaroaleman commented Aug 18, 2020

alvaroaleman commented Aug 18, 2020

✨ Add metrics for total workers and active workers #1125

✨ Add metrics for total workers and active workers #1125

Conversation

alvaroaleman commented Aug 11, 2020 • edited Loading

k8s-ci-robot commented Aug 11, 2020

alvaroaleman commented Aug 11, 2020

alvaroaleman commented Aug 11, 2020

alvaroaleman commented Aug 11, 2020

estroz Aug 11, 2020

Choose a reason for hiding this comment

alvaroaleman Aug 11, 2020

Choose a reason for hiding this comment

alvaroaleman commented Aug 17, 2020

estroz commented Aug 18, 2020

estroz commented Aug 18, 2020

alvaroaleman commented Aug 18, 2020

alvaroaleman commented Aug 18, 2020

alvaroaleman commented Aug 18, 2020

alvaroaleman commented Aug 11, 2020 •

edited

Loading