Secure diagnostics (metrics, pprof, log level changes) #9289
Labels
area/metrics
Issues or PRs related to metrics
kind/feature
Categorizes issue or PR as related to a new feature.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
What would you like to be added (User Story)?
As an operator I would like to be able to safely scrape metrics from Cluster API controllers with minimal effort.
Detailed Description
Today Cluster API only provides a
metrics-bind-addr
flag to configure the metrics endpoint. The metrics endpoint is always using http and doesn't have any authorization. Because of security concerns nowadays Cluster API has a default value oflocalhost:8080
. This means that the metrics are only available on localhost which makes it hard to scrape them, e.g. via Prometheus. Folks can set the flag to0.0.0.0:8080
but then everyone can access the metrics. We don't have any secrets in our metrics but it was still considered too unsafe to make it the default.Controller-runtime implemented a new feature with v0.16 which makes it easy to provide a secure endpoint for metrics which uses https and provides authentication and authorization (kubernetes-sigs/controller-runtime#2407).
On a high-level we can now expose metrics the same way as core Kubernetes controllers (https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#metrics-in-kubernetes).
To scrape metrics with the secured endpoint, they would now need a ClusterRole like the following:
Note: A ClusterRole like this is already deployed per default in the Prometheus Helm chart, so that Prometheus is able to scrape metrics from core Kubernetes components. The only thing folks should need on their side when scraping metrics from Cluster API controllers is this config: https://github.com/sbueringer/cluster-api/blob/8a2de8c0060d2dc5169d3ebb86dc5605bc856492/hack/observability/prometheus/values.yaml#L31-L33. Everything else should just work out-of-the-box per default.
For folks who still remember, this is basically a subset of the functionality of kube-rbac-proxy that we used in the past.
Notes:
Anything else you would like to add?
No response
Label(s) to be applied
/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: