Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeated adding and deleting CustomResourceDefinitions causes duplicate metric entries #2223

Open
k15r opened this issue Oct 20, 2023 · 11 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@k15r
Copy link

k15r commented Oct 20, 2023

What happened:

This is part of our kubestate customresource configuration:

       - errorLogV: 0
         groupVersionKind:
           group: operator.kyma-project.io
           kind: Keda
           version: '*'
         metrics:
         - each:
             stateSet:
               labelName: state
               list:
               - Ready
               - Processing
               - Error
               - Deleting
               - Warning
               path:
               - status
               - state
             type: StateSet
           errorLogV: 0
           help: status of Keda CR
           labelsFromPath:
             name:
             - metadata
             - name
             namespace:
             - metadata
             - namespace
           name: keda_status

after adding and deleting the corresponding CRD and on CR its kind this is a part of the /metrics response of kubestatemetrics:

# HELP kube_customresource_keda_status status of Keda CR
# TYPE kube_customresource_keda_status stateset
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_keda_status status of Keda CR
# TYPE kube_customresource_keda_status stateset
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_keda_status status of Keda CR
# TYPE kube_customresource_keda_status stateset
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0

as you can see, there are multiple entries for the same metric (#HELP, and #TYPE is mentioned 3 times. Within a single metric block lines are duplicated .

What you expected to happen:

  • The metric exists only one time in the output.
  • lines are unique

How to reproduce it (as minimally and precisely as possible):

  • apply a scrape configuration for a GKV
  • create the CRD in the cluster
  • create a CR for its kind
  • wait till the CR was scraped
  • delete the CRD from the cluster
    • create the CRD in the cluster
  • create a CR for its kind
  • wait till the CR was scraped
  • repeat those steps a few times

Anything else we need to know?:

For an advanced version of this bug create the following configuration in the cluster:

      - groupVersionKind:
           group: "operator.kyma-project.io"
           kind: "Sample"
           version: "*"
         errorLogV: 0
         metrics:
           - name: module_status
             errorLogV: 10
             help: "status of Module CR"
             each:
               type: StateSet
               stateSet:
                 labelName: state
                 path: [status, state]
                 list: [Ready, Processing, Error, Deleting, Warning]
             labelsFromPath:
               name: [metadata, name]
               namespace: [metadata, namespace]
       - errorLogV: 0
         groupVersionKind:
           group: operator.kyma-project.io
           kind: Keda
           version: '*'
         metrics:
         - each:
             stateSet:
               labelName: state
               list:
               - Ready
               - Processing
               - Error
               - Deleting
               - Warning
               path:
               - status
               - state
             type: StateSet
           errorLogV: 0
           help: status of Module CR
           labelsFromPath:
             name:
             - metadata
             - name
             namespace:
             - metadata
             - namespace
           name: module_status

This configuration puts the metrics of two different CRs into the same metric (kube_customresource_module_status)

Now if you create both CRDs and a matching CR and repeatedly create and remove one of the CRDs you will get output similar to this (here the sample-CRD was deleted):

# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Error"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Ready"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Warning"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Error"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Ready"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Warning"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Error"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Ready"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Warning"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0

Environment:

  • kube-state-metrics version: 2.10.0
  • Kubernetes version (use kubectl version): v1.26.7
  • Cloud provider or hardware configuration:
  • Other info:
@k15r k15r added the kind/bug Categorizes issue or PR as related to a bug. label Oct 20, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 20, 2023
@dashpole
Copy link
Contributor

dashpole commented Nov 2, 2023

/triage accepted
/assign @CatherineF-dev @rexagod

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Nov 2, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 2, 2023
@bergerx
Copy link

bergerx commented Nov 3, 2023

I came here to open the same issue just to find it's already here.

This issue simply kills the ability to use kind: "*" or version: "*" when there are multiple items under the metrics of that resource.

Here you can find some example manifests and steps to reproduce the issue: https://gist.github.com/bergerx/adad24dcd7cc360e1f36fbb98407b27b

git clone git@gist.github.com:adad24dcd7cc360e1f36fbb98407b27b.git ksm-2223
minikube start
kubectl apply \
  -f ksm-2223/crd-bar.example.com.yaml \
  -f ksm-2223/crd-foo.example.com.yaml
kubectl apply \
  -f ksm-2223/cr-bar.yaml \
  -f ksm-2223/cr-foo.yaml
go run main.go --custom-resource-state-only --custom-resource-state-config-file ksm-2223/custom-resource-config-file.yaml --kubeconfig ~/.kube/config

And here is the output:

$ curl localhost:8080/metrics
# HELP cr_creationtimestamp 
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.699031755e+09
# HELP cr_resourceversion 
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 508820
# HELP cr_creationtimestamp 
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 1.699031755e+09
# HELP cr_resourceversion 
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 508819

Prometheus compatible parsers will throw an error like this on line 8:

second TYPE line for metric name ... or TYPE reported after samples

In the example above it's a single resource definition in the custom-resource-state-config file, but the same issue also happens if the same metric name is somehow used for different GVKs, which I believe is also a valid scenario. e.g. we used to have this item under the .spec.resources repeated for multiple CRDs:

  - groupVersionKind:
      group: our.internal.group    # we have a copy of this whole thing for each internal group
      kind: "*"
      version: "*"
    labelsFromPath:
      name: [metadata, name]
      namespace: [metadata, namespace]
    metricNamePrefix: "cr"
    metrics:
    - name: status
      each:
        type: Gauge
        gauge:
          path: [status, conditions]
          labelsFromPath:
            type: [type]
          valueFrom: [status]

@bergerx
Copy link

bergerx commented Nov 17, 2023

#1810 seems to be a related issue.

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Nov 21, 2023

I can reproduce this issue (metric values are not put together for one same metric) using #2223 (comment).

$ curl localhost:8089/metrics
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 391
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 392

QQ: I think the issue is that KSM doesn't put same metric value together. Is it correct? cc @bergerx @k15r

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Nov 21, 2023

I think the issue is here https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/builder.go#L210

		availableStores[gvrString] = func(b *Builder) []cache.Store {
			return b.buildCustomResourceStoresFunc(
				f.Name(),
				f.MetricFamilyGenerators(),
				f.ExpectedType(),
				f.ListWatch,
				b.useAPIServerCache,
			)
		}
  1. It always sets new values in availableStores[gvrString] and never clears up. So it still collects obsolete metrics.
  2. It uses GVR as a key, so it will generate two metrics for Foo and Bar.

@k15r
Copy link
Author

k15r commented Nov 23, 2023

@CatherineF-dev Thanks for taking care of this issue.

I can reproduce this issue (metric values are not put together for one same metric) using #2223 (comment).

$ curl localhost:8089/metrics
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 391
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 392

QQ: I think the issue is that KSM doesn't put same metric value together. Is it correct? cc @bergerx @k15r

In my opinion there are multiple issues shown in your output:

  1. it creates duplicate entries for the same metric:
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge

it must look like this as "Only one TYPE line may exist for a given metric name"

# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
  1. the metric values differ
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 391
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 392

Here it displays 392 AND 391 for the same metric with exactly the same values. It is not clear which one to use. For clients trying to parse this TEF there is no way to identify the correct value.

@korjek
Copy link

korjek commented Nov 28, 2023

guys, could you please update with ETA (if any) for this bug?
We are affected by this for Vertical Pod Autoscaler metrics in case multiple containers run in the same pod.
(kube-state-metrics CRS are configured accordingly to doc in this PR)

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Dec 6, 2023

Hi @k15r, could you provide detailed steps to reproduce this issue?

The first issue I want to fix is this:

 curl localhost:8089/metrics
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.701828773e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 909919
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.701828773e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 909919

@CatherineF-dev
Copy link
Contributor

Could you try #2257 to see whether repeated adding and deleting CustomResourceDefinitions causes duplicate metric entries is fixed?

@cunningr
Copy link

I was just trying this feature on v2.10.1 with a type: StateSet and I think I see this or something very similar(??) or maybe a different issue(??).

With a config such as:

      containers:
      - args:
        - --port=8080
        - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
        - --telemetry-port=8081
        - --custom-resource-state-config
        - |
          spec:
            resources:
              - groupVersionKind:
                  group: "cluster.x-k8s.io"
                  version: "v1beta1"
                  kind: "Machine"
                metrics:
                  - name: "cunningr"
                    help: "Phase of Machines"
                    each:
                      type: StateSet
                      stateSet:
                        labelName: phase
                        path: ["status","phase"]
                        list: ['Provisioned', 'Pending', 'Running', 'Deleting', 'Failed']

Each of my Machine instances seems to get a new metrics instance:

kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 0

I would have expected those to be aggregated into a single gauge metric for each state?

@jmtt89
Copy link

jmtt89 commented Aug 12, 2024

Hi, i have this problem configuring VPA with goldilocks and using kube prometheus stack, for some reason after a regular upgrade we start getting this warning from alertmanager

[[FIRING:1] PrometheusDuplicateTimestamps](...)
Severity: Warning
Summary: Prometheus is dropping samples with duplicate timestamps.

I use a grep with a PortForward to kube-state-metric to find what is the duplicated metric, after some time researching, identify the problem are when apply upgrades to CRD related to CustomResourceDefinitions that produce kube-state-metric refreshing and just add new CustomResourceDefinitions at the bottom of /metrics endpoint.

I see too in logs on kube-state-metrics "Custom resource state added metrics" added five times, all with the same familyNames

... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...

after that i just manually restart the deployment of kube-state-metrics and when start again all is working as expected (without duplicates). I don't know if this experience will help anyone, but I think it may be related to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants