Cluster API State Metrics implementation #6458

chrischdi · 2022-04-28T09:28:20Z

Follow-up issue to track implementation of the merged proposal

📖 Cluster API State Metrics proposal #6404

as follow-up to

Metrics #1477

Current state is that the contribution PR needs to be done for https://github.com/mercedes-benz/cluster-api-state-metrics to the exp/state-metrics directory which is WIP at the branch https://github.com/mercedes-benz/cluster-api/tree/exp/casm-introduce .

User Story
xref proposal user stories

TODOs:

Proposal: 📖 Cluster API State Metrics proposal #6404
~~Contribution: ✨ Contribute experimental cluster-api-state-metrics #6570~~ not necessary, we can directly use kube-state-metrics and the new CustomResource configuration
Integrate to tilt ✨ Integrate kube-state-metrics and CR config into tilt. #7095
Implement metrics mentioned in the proposal but missing in the current implementation
- exception: _labels metrics
Implement cluster_info including .spec.controlPlaneEndpoint.host as label metric to solve Raise a metric whenever CAPI cannot see a remote cluster client #5510
- ✨ Integrate kube-state-metrics and CR config into tilt. #7095
~~clarify and implement steps for image publishing~~
- not required because we soon can use upstream kube-state-metrics images
Create issue to clarify and implement release manifest
resync the proposal to the current state 📖 Sync cluster-api-state-metrics proposal to match implementation state. #7183
review again the *status_replica metrics for consistency purposes ✨ Add missing status_replicas_ready metric for MachineDeployments at kube-state-metrics #7166

Detailed Description

[A clear and concise description of what you want to happen.]

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind feature

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2022-04-28T10:00:18Z

/milestone v1.2

chrischdi · 2022-06-03T12:25:35Z

Looks like the kube-state-metrics project made the implementation obsolete.

https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.5.0 is now available with a lot of great contributions from the community! It comes with more metrics for standard components and an experimental feature to create your own metrics for CRDs! 🎉 Isn't Friday the perfect opportunity to deploy this to your cluster? 😉

Source: https://kubernetes.slack.com/archives/CJJ529RUY/p1654257314683059

Looks like we should adapt, check if we could do the same with configuration only and deploying kube-state-metrics instead :-)

sbueringer · 2022-06-03T12:33:02Z

Oh, interesting turn of events. Yeah let's figure out if that covers our use case.

bavarianbidi · 2022-06-07T07:24:41Z

As i currently prepare our internal repositories to make use of cluster-api-state-metrics and try to propose a generic description to adopt different infrastructure providers as well i just gave kube-state-metrics a short try with following config.

kind: CustomResourceStateMetrics
spec: 
  resources: 
    - groupVersionKind: 
        group: cluster.x-k8s.io
        kind: Machine
        version: v1beta1
      metrics: 
        - each: 
            path: 
              - status
              - phase
          help: "machine phase"
          name: phase

As kube-state-metrics expect a float as value (https://github.com/kubernetes/kube-state-metrics/blob/7f09ca71ba25af4d2cbb1637e13b28c1d0028159/pkg/customresourcestate/registry_factory.go#L490-L525) above config will fail with

E0607 09:12:00.168168  126676 registry_factory.go:469] "kube_cluster_x-k8s_io_v1beta1_Machine_phase" err="[status,phase]: []: strconv.ParseFloat: parsing \"Runn
ing\": invalid syntax"

Tried to circumvent the fact that a float is required with following config where i defined the path to the only numbered value field in the Machine via

kind: CustomResourceStateMetrics
spec: 
  resources: 
    - groupVersionKind: 
        group: cluster.x-k8s.io
        kind: Machine
        version: v1beta1
      metrics: 
        - each: 
            valueFrom:
              - metadata
              - generation
            path: 
              - status
              - phase
          help: "machine phase"
          name: phase

But still get an error:

E0607 09:12:00.168234  126676 registry_factory.go:469] "kube_cluster_x-k8s_io_v1beta1_Machine_phase_annotation" err="[status,phase]: [metadata,generation]: expe
cted number but found nil value"

It seems to me, that a well scoped metrics server for ClusterAPI (+ infrastructure implementations) still make sense.

sbueringer · 2022-06-07T07:44:31Z

Just for my understanding, they only allow exposing floats directly as metrics? So something like the condition metrics just doesn't work?

bavarianbidi · 2022-06-07T07:54:56Z

Just for my understanding, they only allow exposing floats directly as metrics? So something like the condition metrics just doesn't work?

Exactly, exposing the condition as metric only might work, if at least one numbered field is in the spec (but my hack via metadata.generation does not work - so i guess the field must be in the spec or status).

If KSM-folks accept a patch for this and their plans are longer support for this experimental feature this could be a very promising solution for metrics.

sbueringer · 2022-06-07T07:56:20Z

Yup that was my thinking. Essentially what is the gap and can we upstream enough to avoid maintaining our own state metrics

chrischdi · 2022-06-07T13:01:15Z

I started a discussion in kube-state-metrics slack : https://kubernetes.slack.com/archives/CJJ529RUY/p1654593034854759 and we will move the discussion to an issue.

For the CAPI project I think we have some options here to continue:

wait (and maybe help to reach that point) for kube-state-metrics to enable all kind of metrics via config we want
implement custom binary for now:
a. and migrate to configuration-only kube-state-metrics as soon as all metrics are possible to get defined via configuration (all metrics already possible via config could already get done via config)
b. ~~and keep custom binary/implementation~~

schrej · 2022-06-08T14:22:58Z

implement custom binary for now

Assuming kube-state-metrics can cover all requirements and we'll migrate to it at some point, does it make sense to even migrate cluster-api-state-metrics into this repository? I think it would be fine to leave it where it is, and then just add the kube-state-metrics configuration to this repository (as well as tilt integration etc.).

The time spent moving the code here is probably better invested into kube-state-metrics to get the custom metrics into a suitable replacement.

sbueringer · 2022-06-10T04:10:56Z

The time spent moving the code here is probably better invested into kube-state-metrics to get the custom metrics into a suitable replacement.

In general I agree. Depending on how fast we can get the necessary features into kube-state-metrics we could also merge it and then use more and more from the upstream kube-state-metrics until cluster-api-state-metrics is not necessary anymore (as cluster-api-state-metrics imports/extends kube-state-metrics).

But I think for now the best approach is to wait a bit with cluster-api-state-metrics to get a feeling for how fast we can make progress in extending kube-state-metrics (and thus how long it would take to get everything we need supported there).

chrischdi · 2022-07-05T09:51:21Z

Some progress on the topic:

I invested some time in kubernetes/kube-state-metrics#1755 which resulted in the PR kubernetes/kube-state-metrics#1777 to start some feedback.

Running kube-state-metrics from the linked branch and using the config in kubernetes/kube-state-metrics#1777 (comment) would already cover nearly all metrics of the proposal, except the labels metrics which I did not yet take a look at.

As of now I only see one small disadvantage for using the configuration style kube-state-metrics: we may have to split the paused metric to annotation_paused and (if matches CR) spec_paused because the config (currently) does not provide the feature of combining values from different paths in the CR.

All in all I still think going forward with upstream kube-state-metrics makes way more sense then implementing a custom binary.

sbueringer · 2022-07-05T10:13:05Z

Sounds great!

As of now I only see one small disadvantage for using the configuration style kube-state-metrics: we may have to split the paused metric to annotation_paused and (if matches CR) spec_paused because the config (currently) does not provide the feature of combining values from different paths in the CR.

I think that's fine and it's rather in the spirit of kube-state-metrics to straight forward expose the current state from resources without doing additional calculations. If desired this can always be done (at least in Prometheus) either in the alerts or in record rules.

chrischdi · 2022-08-18T19:28:57Z

pkg/customresourcestate implement info and stateSet metric type and refactor configuration file kubernetes/kube-state-metrics#1777 finally got merged today 🎉

I will follow up the next days with a WIP PR which already integrates kube-state-metrics's helm chart into Tilt, similar to prometheus, grafana, ...

Local version is already running and seems to integrate greatly 🎉

fabriziopandini · 2022-09-02T12:43:57Z

@chrischdi can we close this issue now and move pending items to separated issues? (I have already opened #7158)

chrischdi · 2022-09-03T10:53:24Z

@chrischdi can we close this issue now and move pending items to separated issues? (I have already opened #7158)

I've got two AI's I want to do and would have closed this issue afterwards:

resync the proposal to the current state
review again the *status_*replica* metrics for consistency purposes

Also the issue mentions the point clarify and implement release manifest, I am also ok with not doing this for now and/or creating a separate issue for that.

sbueringer · 2022-09-05T09:54:02Z

I think finishing the two sub-tasks that you mentioned as part of this issue + a separate issue for release manifests would be good.

I'm not sure if we want to release the manifests as long as we are writing / maintaining them manually. So this could depend on the gen tool.

chrischdi · 2022-09-05T11:37:48Z

I'm not sure if we want to release the manifests as long as we are writing / maintaining them manually. So this could depend on the gen tool.

Fair point to get discussed in the follow-up issue then 👍

Ok, I will do so and also create the issue.

chrischdi · 2022-09-13T12:37:27Z

xref: issue in kube-state-metrics for *_labels metrics, including example configuration to workaround: kubernetes/kube-state-metrics#1832

chrischdi · 2022-09-16T09:42:14Z

Closing here by marking the last step as done 🎉 / via creation of:

Add a Custom Resource metrics configuration file to the release artifacts #7229

/close

k8s-ci-robot · 2022-09-16T09:42:19Z

@chrischdi: Closing this issue.

In response to this:

Closing here by marking the last step as done 🎉 / via creation of:

Add a Custom Resource metrics configuration file to the release artifacts #7229

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 28, 2022

k8s-ci-robot added this to the v1.2 milestone Apr 28, 2022

fabriziopandini added the area/health label Apr 28, 2022

tobiasgiese mentioned this issue May 30, 2022

✨ Contribute experimental cluster-api-state-metrics #6570

Closed

chrischdi mentioned this issue Jun 8, 2022

Adjust Custom Resource configuration to support open metric types StateSet and Info kubernetes/kube-state-metrics#1755

Closed

killianmuldoon mentioned this issue Jul 25, 2022

Raise a metric whenever CAPI cannot see a remote cluster client #5510

Closed

fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

fabriziopandini removed this from the v1.2 milestone Jul 29, 2022

fabriziopandini assigned chrischdi Jul 29, 2022

fabriziopandini added this to the v1.3 milestone Jul 29, 2022

chrischdi mentioned this issue Aug 19, 2022

✨ Integrate kube-state-metrics and CR config into tilt. #7095

Merged

6 tasks

This was referenced Sep 5, 2022

✨ Add missing status_replicas_ready metric for MachineDeployments at kube-state-metrics #7166

Merged

📖 Sync cluster-api-state-metrics proposal to match implementation state. #7183

Merged

chrischdi mentioned this issue Sep 16, 2022

Add a Custom Resource metrics configuration file to the release artifacts #7229

Open

k8s-ci-robot closed this as completed Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster API State Metrics implementation #6458

Cluster API State Metrics implementation #6458

chrischdi commented Apr 28, 2022 •

edited

Loading

fabriziopandini commented Apr 28, 2022

chrischdi commented Jun 3, 2022

sbueringer commented Jun 3, 2022

bavarianbidi commented Jun 7, 2022

sbueringer commented Jun 7, 2022

bavarianbidi commented Jun 7, 2022

sbueringer commented Jun 7, 2022

chrischdi commented Jun 7, 2022 •

edited

Loading

schrej commented Jun 8, 2022

sbueringer commented Jun 10, 2022 •

edited

Loading

chrischdi commented Jul 5, 2022

sbueringer commented Jul 5, 2022

chrischdi commented Aug 18, 2022

fabriziopandini commented Sep 2, 2022

chrischdi commented Sep 3, 2022 •

edited

Loading

sbueringer commented Sep 5, 2022 •

edited

Loading

chrischdi commented Sep 5, 2022

chrischdi commented Sep 13, 2022

chrischdi commented Sep 16, 2022

k8s-ci-robot commented Sep 16, 2022

Cluster API State Metrics implementation #6458

Cluster API State Metrics implementation #6458

Comments

chrischdi commented Apr 28, 2022 • edited Loading

fabriziopandini commented Apr 28, 2022

chrischdi commented Jun 3, 2022

sbueringer commented Jun 3, 2022

bavarianbidi commented Jun 7, 2022

sbueringer commented Jun 7, 2022

bavarianbidi commented Jun 7, 2022

sbueringer commented Jun 7, 2022

chrischdi commented Jun 7, 2022 • edited Loading

schrej commented Jun 8, 2022

sbueringer commented Jun 10, 2022 • edited Loading

chrischdi commented Jul 5, 2022

sbueringer commented Jul 5, 2022

chrischdi commented Aug 18, 2022

fabriziopandini commented Sep 2, 2022

chrischdi commented Sep 3, 2022 • edited Loading

sbueringer commented Sep 5, 2022 • edited Loading

chrischdi commented Sep 5, 2022

chrischdi commented Sep 13, 2022

chrischdi commented Sep 16, 2022

k8s-ci-robot commented Sep 16, 2022

chrischdi commented Apr 28, 2022 •

edited

Loading

chrischdi commented Jun 7, 2022 •

edited

Loading

sbueringer commented Jun 10, 2022 •

edited

Loading

chrischdi commented Sep 3, 2022 •

edited

Loading

sbueringer commented Sep 5, 2022 •

edited

Loading