manager.go:694] Error getting data for container / because of race condition #3407

sgpinkus · 2023-10-07T02:06:43Z

Running cadvisor like:

# VERSION=v0.47.2 # use the latest release version from https://github.com/google/cadvisor/releases
sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8080:8080 \
  --privileged \
  --device=/dev/kmsg \
  gcr.io/cadvisor/cadvisor:$VERSION \
  --docker_only=true \
  --disable_root_cgroup_stats=true

Gives error log every 15s:

manager.go:694] Error getting data for container / because of race condition

Setting --disable_root_cgroup_stats=false make this error log go away.

The text was updated successfully, but these errors were encountered:

hhromic · 2023-10-10T15:12:19Z

This happens to us as well when we disable root cgroup stats.

That log message actually appears when something scrapes the /metrics endpoint of cadvisor.
If you do curl -s http://localhost:8080/metrics, you will notice it appears with every curl call.

The 15s interval you are seeing is likely your Prometheus server scraping at every 15s (the default)?

There was recently a PR fixing a related log spam: #3341 (not yet released)
Perhaps the same kind of fix can be a applied for this error here:

cadvisor/manager/manager.go

Line 701 in 688835d

    
           klog.Warningf("Error getting data for container %s because of race condition", name)

The error message itself is a bit misleading, as this is not really a race condition.
The / container should not be added to the entities to collect data for when root cgroup stats are disabled.
But I have not digged deeper on where that is being done to propose a proper fix.

sgpinkus · 2023-10-10T19:36:19Z

The 15s interval you are seeing is likely your Prometheus server scraping at every 15s (the default)?

Yes that would be it!

hhromic · 2023-10-10T22:21:47Z

The error message itself is a bit misleading, as this is not really a race condition.
The / container should not be added to the entities to collect data for when root cgroup stats are disabled.
But I have not digged deeper on where that is being done to propose a proper fix.

I got some time now and did dig deeper.
Turns out that container metrics are collected recursively starting from / here:

cadvisor/metrics/prometheus.go

Line 1822 in 688835d

containers, err := c.infoProvider.GetRequestedContainersInfo("/", c.opts)

Therefore it is normal to hit / during collection even when root cgroup stats are disabled.
That being said, I think indeed refactoring the error logging to V(4) as done in #3341 is an appropriate solution.
I will open a PR for it now :)

hhromic mentioned this issue Oct 10, 2023

manager: require higher verbosity level for container info misses #3412

Merged

dims closed this as completed in #3412 Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

manager.go:694] Error getting data for container / because of race condition #3407

manager.go:694] Error getting data for container / because of race condition #3407

sgpinkus commented Oct 7, 2023

hhromic commented Oct 10, 2023 •

edited

Loading

sgpinkus commented Oct 10, 2023

hhromic commented Oct 10, 2023

manager.go:694] Error getting data for container / because of race condition #3407

manager.go:694] Error getting data for container / because of race condition #3407

Comments

sgpinkus commented Oct 7, 2023

hhromic commented Oct 10, 2023 • edited Loading

sgpinkus commented Oct 10, 2023

hhromic commented Oct 10, 2023

hhromic commented Oct 10, 2023 •

edited

Loading