Default GOMAXPROCS to 1 #1964

SuperQ · 2021-02-10T12:40:47Z

Avoid running on all CPUs by limiting the Go runtime to one CPU by
default. Avoids having Go routines schedule on every CPU, driving up the
visible run queue length on high CPU count systems.

#1880

Signed-off-by: Ben Kochie superq@gmail.com

Avoid running on all CPUs by limiting the Go runtime to one CPU by default. Avoids having Go routines schedule on every CPU, driving up the visible run queue length on high CPU count systems. #1880 Signed-off-by: Ben Kochie <superq@gmail.com>

hhoffstaette · 2021-02-24T19:05:52Z

I was very enthusiastic about this change since it really noticeably dropped the number of process in the queue:

Unfortunately it also destroyed the accuracy of some exporters, probably because a collector gets preempted a lot esp. when it does any kind of I/O:

Bumping GOMAXPROCS to 2 partially resolved the problem, but it's still badly misleading:

With 4 (at ~19.45) everthing is back to normal:

I don't know if there's an easy way to run goroutines without preemption (in order), but defaulting to 1 is IMHO a no-go.

Edit: so I just found asyncpreemptoff=1.. 😆

SuperQ · 2021-02-24T19:43:22Z

Yes, it's an impossible scheduling problem.

Also, the NTP collector is basically a terrible implementation. Really, you want to monitor the NTPd directly, rather than try and monitor it with a packet. It's been on my TODO list for a while to implement the C protocol for Chrony. IIRC it's similar in implementation to ntpdc.

There's two much better options for time already. There's the timex collector, and timestamp(node_time_seconds) - node_time_seconds.

hhoffstaette · 2021-02-24T20:34:33Z

Also, the NTP collector is basically a terrible implementation.

Sure, I understand all that. For a while I used a python-based chrony textfile exporter which works correctly, but the idea of cron/python/two shellouts/text scraping is .. brrr.. However RTT always seemed a bit useless anyway, so I might as well get offset/root delay/dispersion from chrony, where they are at least correct and undistorted. 😞

hhoffstaette · 2021-02-24T21:44:24Z

I guess since there are no other collectors that are distorted by the runtime going with GOMAXPROCS=1 is OK after all. Maybe write a line in the release notes that this affects the NTP collector and -runtime.gomaxprocs can just be increased again.

hhoffstaette · 2021-02-25T10:05:16Z

Sorry if this is a stupid question, but wouldn't it be much easier to simply not start any goroutines by default and instead just run all collectors in-order on the main thread? That would prevent the preemption interruptions and likely be faster, too.
Concurrent execution could still be enabled when runtime.gomaxprocs is set to a vlaue >1.

discordianfish · 2021-02-28T10:34:46Z

Yeah I'm not sure that parallelizing proc access was a good idea to begin with. If reading a few bytes from /proc is so slow that it causes worrisome scrape durations that ought to be fixed/changed.
So I feel like there are two kernel issues now:

It's impossible to retrieve cpu metrics serially in a timely fashion
If it's parallelized, we trigger bugs in at least some archs, e.g Cyclic load peaks induced by node_exporter #1963

SuperQ · 2021-02-28T11:55:33Z

Concurrent operation is not typically a problem. We're still only executing one goroutine at a time per posix thread. It's more efficient to let the kernel scheduler and goroutine scheduler to do their jobs. Much of the time spent gathering is waiting on IO, and has no impact on the system.

For example, in #1963, it doesn't actually seem to be a problem with the node_exporter, and is a perception problem with load average and the node_exporter is just showing the system problem by slightly increasing the run queue to be visible.

Really, I think the only real issue is #1880, which is some corner case where we're triggering a spinlock problem. We need to bpftrace the node_exporter's interaction with the kernel to figure out exactly what calls are triggering the race in the spinlock.

SuperQ · 2021-03-02T08:51:20Z

Since this doesn't actually fix the spinlock problem, and has other side effects, I'm going to close it. It's still an option for users to use GOMAXPROCS, so it can be up to them to use it depending on their use case.

Fixing the spinlock problem needs to be directly debugged and addressed.

Default GOMAXPROCS to 1

d11d48c

Avoid running on all CPUs by limiting the Go runtime to one CPU by default. Avoids having Go routines schedule on every CPU, driving up the visible run queue length on high CPU count systems. #1880 Signed-off-by: Ben Kochie <superq@gmail.com>

SuperQ mentioned this pull request Feb 10, 2021

Cyclic load peaks induced by node_exporter #1963

Closed

SuperQ closed this Mar 2, 2021

SuperQ deleted the superq/maxproc branch March 2, 2021 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default GOMAXPROCS to 1 #1964

Default GOMAXPROCS to 1 #1964

SuperQ commented Feb 10, 2021

hhoffstaette commented Feb 24, 2021 •

edited

Loading

SuperQ commented Feb 24, 2021

hhoffstaette commented Feb 24, 2021

hhoffstaette commented Feb 24, 2021 •

edited

Loading

hhoffstaette commented Feb 25, 2021

discordianfish commented Feb 28, 2021

SuperQ commented Feb 28, 2021

SuperQ commented Mar 2, 2021

Default GOMAXPROCS to 1 #1964

Default GOMAXPROCS to 1 #1964

Conversation

SuperQ commented Feb 10, 2021

hhoffstaette commented Feb 24, 2021 • edited Loading

SuperQ commented Feb 24, 2021

hhoffstaette commented Feb 24, 2021

hhoffstaette commented Feb 24, 2021 • edited Loading

hhoffstaette commented Feb 25, 2021

discordianfish commented Feb 28, 2021

SuperQ commented Feb 28, 2021

SuperQ commented Mar 2, 2021

hhoffstaette commented Feb 24, 2021 •

edited

Loading

hhoffstaette commented Feb 24, 2021 •

edited

Loading