Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cadvisor daemonset with containerd goes into crashloopbackoff #2855

Open
madeinindiadot opened this issue Apr 27, 2021 · 8 comments
Open

cadvisor daemonset with containerd goes into crashloopbackoff #2855

madeinindiadot opened this issue Apr 27, 2021 · 8 comments

Comments

@madeinindiadot
Copy link

I have deployed the cadvisor daemonset in my kubernetes cluster. My runtime is containerd. I constantly see that my pods are going into crashloopbackoff. I tried with version v0.30.2, v0.37.5 and latest. kubectl logs doesnt show anything. The pods say "OOMKilled". please advise what is being missed.

NAME READY STATUS RESTARTS AGE
cadvisor-4wl5l 1/1 Running 0 96s
cadvisor-95gqg 1/1 Running 0 96s
cadvisor-pfwpk 1/1 Running 0 98s
cadvisor-xgrp8 1/1 Running 0 99s
cadvisor-4wl5l 0/1 OOMKilled 0 2m
cadvisor-4wl5l 1/1 Running 1 2m1s
cadvisor-xgrp8 0/1 OOMKilled 0 2m9s
cadvisor-xgrp8 1/1 Running 1 2m10s
cadvisor-95gqg 0/1 OOMKilled 0 2m16s
cadvisor-pfwpk 0/1 OOMKilled 0 2m18s
cadvisor-95gqg 1/1 Running 1 2m17s
cadvisor-pfwpk 1/1 Running 1 2m19

@skgsergio
Copy link
Contributor

skgsergio commented May 3, 2021

You might have encountered the same issue as I did. I faced the same behavior and in my case it was because cadvisor for containerd gets all the env vars of all the containers resulting in a ton of memory usage as they get exported as metric labels.

You can add this flags to your cadvisor container in the daemonset, if it solves your issue then is probably the same as mine:

--store_container_labels=false
--whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace

These flags are just a workaround of the issue. For docker, there is an explicit whitelisting for env vars which is not yet implemented for containerd (I made a PR for it to behave as it does with docker #2857).

@madeinindiadot
Copy link
Author

Thanks @skgsergio Let me check that

@madeinindiadot
Copy link
Author

Thanks @skgsergio Adding those parameters worked for me on the image gcr.io/cadvisor/cadvisor:v0.37.5
However, on the image k8s.gcr.io/cadvisor:v0.30.2, it says those parameters are not available. Is there any specific version on the kubernetes gcr registry docker image(k8s.gcr.io/cadvisor) from which these parameters are available? I am currently running a daemonset on the kubernetes with containerd

@skgsergio
Copy link
Contributor

skgsergio commented May 5, 2021

If it worked then 100% is the same issue where cadvisor is not filtering env vars for containerd containers.

Regarding the image, is there any reason you want to use the k8s.gcr.io repo? The images in that repo doesn't meant that there is a specific version of cadvisor for kubernetes, gcr.io/cadvisor/cadvisor:v0.37.5 works fine in kubernetes (we are using this in prod).

@MonicaMagoniCom
Copy link

I'm having the same issue. By following the suggestion, I added the flags and actually there are less OOM killed but still the memory used is high. Consider the attached image: this is the memory consumption of our 4 instances (memory limit is set to 700MiB).
Schermata da 2021-05-06 14-19-01

@iwankgb
Copy link
Collaborator

iwankgb commented May 7, 2021

@MonicaMagoniCom can you provide following information:

  • number of containers running on each host
  • cAdvisor command line parameters used
  • cAdvisor version used.

I agree that close to 700 MiB looks worrying.

CC: @Creatone @dashpole - it seems to be related to the increase discussed in #2853.

@MonicaMagoniCom
Copy link

MonicaMagoniCom commented May 10, 2021

@MonicaMagoniCom can you provide following information:

  • number of containers running on each host
  • cAdvisor command line parameters used
  • cAdvisor version used.

I agree that close to 700 MiB looks worrying.

CC: @Creatone @dashpole - it seems to be related to the increase discussed in #2853.

You find all the information here #2856

@skgsergio
Copy link
Contributor

Looks like @MonicaMagoniCom is using GKE version 1.14.10-gke.42, and as far as I remember is not until 1.19 when GKE started shipping Kubernetes with containerd, so it might not be related to this specific issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants