-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cyclic load peaks induced by node_exporter #1963
Comments
See also #1880 Anybody hitting strange load issues ought to first try removing the cpufreq collector |
Yeah please try to see if it happens without the cpu collector. If that's the case, you can close this issue and we follow up in #1880 |
It would be useful to provide metric data, specifically what metrics are in your graphs, and the results of |
I think you misunderstand how the node_exporter works. There are no intervals in the node_exporter, or most Prometheus exporters for that matter. The node_exporter collects data on demand when Prometheus scrapes |
It would also be useful to include some system information:
Based on what evidence is provide so far, this seems like a graph of load average. This is going to be highly susceptible to run queue noise, and isn't exactly a good metric to rely on for "load'. I'm also suspecting if you set the environment variable |
If my theory about |
Can you maybe provide also graphs for CPU usage split by mode? I'd like to see if this is actual cpu usage in userspace or in kernel space, indicating some issue there. |
I think we should collect the following infos for each case:
@xelatirdan Can you provide these details? |
Hello @discordianfish
Debian 10.10
I see the issue on all servers, don't depends what CPU used.
On EPYC 24 (48 HT) and on Xeon 16 (32 HT)
On CPU usage graph I don't see any spikes. I use node-exporter in docker container: |
Thanks! Yeah looks like it's the issue in #1880. Let's close this and continue over there |
We're running node_exporter on a bunch of ubuntu 18.04 servers within a docker container. We were experiencing mysterious cyclic load peaks every 105 minutes and by gradually switching of services we identified node_exporter as the culprit. We were running node_exporter with some collectors exclude of at first. Now we almost eliminated the problem by excluding most of the collectors. Here's the git diff:
Just to give you a sense of the issue here are some screenshots of the load graphs. As you can tell we deployed node_exporter with reduces collectors at about 0:30.
Fileservers 24h:
Streamingserver 24h:
Strangely there seems to be a second order interference. Here's one server over a 9 day time period:
There is a wave of peaks at 16:30, 1:00, 14:00, 23:00, 11:30, 20:30, 9:00, 18:00.
Conclusion:
I think node_exporter should default to less collectors. I think you should warn people about activating too many collectors, even the more harmless ones. If possible the intervals where stuff gets run inside node_exporter should somehow be synchronized.
I don't want to sound too harsh. node_exporter is a great tool for our infrastructure and I really love prometheus! Thanks!
The text was updated successfully, but these errors were encountered: