-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add readyz
endpoint
#2442
fix: add readyz
endpoint
#2442
Conversation
Discourage the usage of `/metrics` for any probe, and use `/readyz` in place of the earlier telemetry metrics endpoint to secure the exposition data. Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
This issue is currently awaiting triage. If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Is this passing through /readyz from the apiserver to ksm? I think that's not an ideal approach. It should only be ready if metrics are available. |
Right, an oversight on my part. Patched. |
pkg/app/server.go
Outdated
@@ -376,6 +378,18 @@ func buildTelemetryServer(registry prometheus.Gatherer) *http.ServeMux { | |||
// Add metricsPath | |||
mux.Handle(metricsPath, promhttp.HandlerFor(registry, promhttp.HandlerOpts{ErrorLog: promLogger{}})) | |||
|
|||
// Add readyzPath | |||
mux.Handle(readyzPath, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { | |||
count, err := testutil.GatherAndCount(registry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach, I am a bit concerned that we're getting a testutil into a production path.
Is this considered stable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Package testutil provides helpers to test code using the prometheus package of client_golang."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extracted the function in a manner that makes it only dependent on the official library, PTAL.
README.md
Outdated
* `/healthz`: Returns a 200 status code if the application is running. We recommend to use this as a startup probe. | ||
* `/healthz` (exposed on `main`): Returns a 200 status code if the application is running. We recommend to use this for the startup probe. | ||
* `/livez` (exposed on `main`): Returns a 200 status code if the application is not affected by an outage of the Kubernetes API Server. We recommend to using this for the liveness probe. | ||
* `/readyz` (exposed on `self`): Returns a 200 status code if the application is ready to accept traffic. We recommend using this for the readiness probe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `/readyz` (exposed on `self`): Returns a 200 status code if the application is ready to accept traffic. We recommend using this for the readiness probe. | |
* `/readyz` (exposed on `self`): Returns a 200 status code if the application is ready to accept requests and expose metrics. We recommend using this for the readiness probe. |
pkg/app/server.go
Outdated
w.WriteHeader(http.StatusOK) | ||
w.Write([]byte(http.StatusText(http.StatusOK))) | ||
}) | ||
mux.Handle(healthzPath, handleClusterDelegationForProber(client, healthzPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is healthz depending on the health of the apiserver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're making /healthz
depend on the cluster's health in addition to the binary's. As is visible in the list below, there are multiple components in the cluster that are critical to KSM's health (for eg., crd-informer-synced
and start-apiextensions-informers
).
┌[rexagod@nebuchadnezzar] [/dev/ttys002] [readyz]
└[~/repositories/oss/kube-state-metrics]> curl "127.0.0.1:8001/healthz?verbose"
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/start-system-namespaces-controller ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-deprecated-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]poststarthook/apiservice-discovery-controller ok
healthz check passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't make so much sense to me that /livez is on /livez and /healthz is on /healthz of the API Server.
In particular since /healthz is a deprecated endpoint.
The healthz endpoint is deprecated (since Kubernetes v1.16), and you should use the more specific livez and readyz endpoints instead.
https://kubernetes.io/docs/reference/using-api/health-checks/#api-endpoints-for-health
As we're suggesting healthz to be for the startup probe, I think it's fine to simply return 200 on /healthz once the application is running and not change the semantics there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize this was deprecated. Fair enough. 👍🏼
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrueg, rexagod The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Discourage the usage of
/metrics
for any probe, and use/readyz
in place of the earlier telemetry metrics endpoint to secure the exposition data./cc @mrueg