-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change scheduler metrics to conform metrics guidelines #72332
Conversation
/kind feature |
@@ -182,14 +182,16 @@ func (g *genericScheduler) Schedule(pod *v1.Pod, nodeLister algorithm.NodeLister | |||
FailedPredicates: failedPredicateMap, | |||
} | |||
} | |||
metrics.SchedulingAlgorithmPredicateEvaluationDuration.Observe(metrics.SinceInMicroseconds(startPredicateEvalTime)) | |||
metrics.SchedulingAlgorithmPredicateEvaluationDuration.Observe(metrics.SinceInSeconds(startPredicateEvalTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying replacing "microseconds" with "seconds" is a common pattern? If yes, can you show that where it's documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
From the KEP kubernetes-metrics-overhaul, Kubernetes metrics should follow Prometheus best practices.
The seconds
is the suggested base unit for time type metrics. And seconds
already been widely used in other Kubernetes metrics.
So we should change these metrics to let them more consistent with others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: please add explicit "DEPRECATED" notice (maybe in release-note section) to notify users that original "microseconds" metrics will be deprecated, so that we can remove them in future releases.
Otherwise
/lgtm
/retest |
/priority important-longterm |
@Huang-Wei thanks, release-note is updated. |
pkg/scheduler/metrics/metrics.go
Outdated
prometheus.HistogramOpts{ | ||
Subsystem: SchedulerSubsystem, | ||
Name: "e2e_scheduling_latency_microseconds", | ||
Help: "E2e scheduling latency (scheduling algorithm + binding)", | ||
Help: "E2e scheduling latency in microseconds (scheduling algorithm + binding)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding deprecated
in help might be useful.
Agreed with @ravisantoshgudimetla could you start each of the deprecated metric's descriptions with "(deprecated)"? Thanks! Otherwise this lgtm! |
@brancz @Huang-Wei |
/lgtm |
@brancz: GitHub didn't allow me to assign the following users: sig-scheduling-maintainers. Note that only kubernetes members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cc @kubernetes/sig-scheduling-pr-reviews |
@ravisantoshgudimetla can you help review this? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @danielqsj for working on this PR.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brancz, danielqsj, ravisantoshgudimetla The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/sig instrumentation
What this PR does / why we need it:
As part of kubernetes metrics overhaul, change scheduler metrics to conform Kubernetes metrics instrumentation guidelines.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This patch does not remove the existing metrics but mark them as deprecated.
We need 2 releases for users to convert monitoring configuration.
Does this PR introduce a user-facing change?: