-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: support empty groups again, e.g. Node #2435
Conversation
This issue is currently awaiting triage. If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @robbat2! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: robbat2 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c483110
to
286c043
Compare
286c043
to
55eb030
Compare
PR#1851 introduced accidental breakage for custom metrics on resources with an empty empty, e.g. Node. Fix it, and add tests to catch the next breakage. ``` demo_timestamp{customresource_group="",customresource_kind="Node",customresource_version="v1",node="kind-control-plane"} 1.719618433e+09 ``` Reference: kubernetes#1851 Reference: kubernetes#2434 Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
55eb030
to
544f72c
Compare
All custom resources have an API group. Resources who don't are native Kubernetes resources for which kube-state-metrics already provides a safe set of metrics that is frequently updated to support new users requests. The ability to extend the metrics exposed by kube-state-metrics was never meant for natives resources and that is not something we wanted to support going forward. Some additional context about that decision was added in #2044. |
Exactly what Damien said. Additionally, PTAL at the docs, https://github.com/kubernetes/kube-state-metrics/pull/1851/files#diff-1e83b132955ba4b76949317660faf1fa222f0ee999d4fb3b4fb9dd318b7ac2beR521. |
Could you point me in the direction of docs showing functionality such as this, then?
this would twist the dimensions of a label from a node resource to a timestamp gauge value in a new metric which is extremely useful, in addition to it being both possible AND encouraged for custom resources. seems like a bit of a shame to not allow users to do this sort of thing with core/builtin resources? |
Native resources are supported in-tree only, by design, and thus cannot leak into the CRS (Custom Resource State) machinery. If you believe something needs to be added for them, please open a PR, just as the docs say in the aforementioned link. I'd be happy to take a look. |
just so im clear, this functionality isn't available for native resources somewhere else within ksm and we've just been trying to shoehorn the custom resource version of it into native resources? is there any plan to replicate this functionality for native resources? or are we limited to the metrics and their dimensions as provided by this project for native resources? again, it seems a bit of a waste of great available functionality "restricting" this sort of metrics manipulation to only custom resources |
As @daveoy notes, one of our use cases is exporting a timestamp value from a Node annotation. This is not possible via KSM since v2.9.0. It was previously available and worked in in v2.8.2 - from that perspective you broke existing functionality. Another use case we have is splitting out a very large set of Node labels. We have Nodes with 100+ labels, over multiple namespaces, and even with metric-labels-allowlist mechanism, this causes cardinality problems in Prometheus. We want to export each of those namespaces of labels as a distinct metric - to reduce the complexity of |
The custom resource metrics feature is still experimental (See: Please do not rely on an experimental feature in production systems. |
Here's example of what we're doing to export the labels in something beyond
|
@mrueg the text says These are core resources, and therefore not included in that definition. If there's a better way to export the metrics from our use cases, please do share, because right now the v2.8.2 functionality looks to be the most fitting solution. To recap:
What else does the PR need, in terms of supporting code and tests, to show this is valid & safe functionality? I did already include unit testing & E2E testing to show it works within the scope of the existing test suites. |
I opened the initial discussion asking how to get metrics for the core ResourceQuota resources. That resource specifically is designed to be extendable to all kinds of resource types (not just the standard cpu, memory and ephemeralStorage but for example GPU cores). In that discussion I also mentioned that supporting all those different options would likely be a significant maintenance burden on the project, aside from creating a variable number of metric names. I understand the logic of not allowing the customresource setup to use core resources, but this usecase absolutely needs support. We frequently run into problems with resourcequotas blocking pod creation as our developers migrate more stuff to kubernetes. We would like to have monitoring alerting our IT department whenever resource usage goes > 80% of the quota, but this is currently not possible with KSM. |
There are SIG-Instrumentation office hours that you can join if you want to discuss this further. I am unable to attend those though as they conflict with my day job. |
Core Resources are not experimental, but the whole feature custom resource metrics feature is marked experimental.
As Kubernetes API exposes a json endpoint, an implementation with https://github.com/prometheus-community/json_exporter might be possible. |
I started a thread here https://kubernetes.slack.com/archives/C20HH14P7/p1720615100646059 |
Based on the discussion we had on slack, are we ok with closing this? |
/close |
@logicalhan: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
PR#1851 introduced accidental breakage for custom metrics on resources with an empty empty, e.g. Node.
Fix it, and add tests to catch the next breakage.
Reference: #1851
Reference: #2434
Signed-off-by: Robin H. Johnson rjohnson@coreweave.com
What this PR does / why we need it:
Fixes breakage introduced in v2.9.0
How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Does not change cardinality.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2434