Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default GOMAXPROCS to 1 #1964

Closed
wants to merge 1 commit into from
Closed

Default GOMAXPROCS to 1 #1964

wants to merge 1 commit into from

Conversation

SuperQ
Copy link
Member

@SuperQ SuperQ commented Feb 10, 2021

Avoid running on all CPUs by limiting the Go runtime to one CPU by
default. Avoids having Go routines schedule on every CPU, driving up the
visible run queue length on high CPU count systems.

#1880

Signed-off-by: Ben Kochie superq@gmail.com

Avoid running on all CPUs by limiting the Go runtime to one CPU by
default. Avoids having Go routines schedule on every CPU, driving up the
visible run queue length on high CPU count systems.

#1880

Signed-off-by: Ben Kochie <superq@gmail.com>
@hhoffstaette
Copy link
Contributor

hhoffstaette commented Feb 24, 2021

I was very enthusiastic about this change since it really noticeably dropped the number of process in the queue:
processes-1

Unfortunately it also destroyed the accuracy of some exporters, probably because a collector gets preempted a lot esp. when it does any kind of I/O:
ntp-1

Bumping GOMAXPROCS to 2 partially resolved the problem, but it's still badly misleading:
ntp-2

With 4 (at ~19.45) everthing is back to normal:
ntp-4

I don't know if there's an easy way to run goroutines without preemption (in order), but defaulting to 1 is IMHO a no-go.

Edit: so I just found asyncpreemptoff=1.. 😆

@SuperQ
Copy link
Member Author

SuperQ commented Feb 24, 2021

Yes, it's an impossible scheduling problem.

Also, the NTP collector is basically a terrible implementation. Really, you want to monitor the NTPd directly, rather than try and monitor it with a packet. It's been on my TODO list for a while to implement the C protocol for Chrony. IIRC it's similar in implementation to ntpdc.

There's two much better options for time already. There's the timex collector, and timestamp(node_time_seconds) - node_time_seconds.

@hhoffstaette
Copy link
Contributor

Also, the NTP collector is basically a terrible implementation.

Sure, I understand all that. For a while I used a python-based chrony textfile exporter which works correctly, but the idea of cron/python/two shellouts/text scraping is .. brrr.. However RTT always seemed a bit useless anyway, so I might as well get offset/root delay/dispersion from chrony, where they are at least correct and undistorted. 😞

@hhoffstaette
Copy link
Contributor

hhoffstaette commented Feb 24, 2021

I guess since there are no other collectors that are distorted by the runtime going with GOMAXPROCS=1 is OK after all. Maybe write a line in the release notes that this affects the NTP collector and -runtime.gomaxprocs can just be increased again.

@hhoffstaette
Copy link
Contributor

Sorry if this is a stupid question, but wouldn't it be much easier to simply not start any goroutines by default and instead just run all collectors in-order on the main thread? That would prevent the preemption interruptions and likely be faster, too.
Concurrent execution could still be enabled when runtime.gomaxprocs is set to a vlaue >1.

@discordianfish
Copy link
Member

Yeah I'm not sure that parallelizing proc access was a good idea to begin with. If reading a few bytes from /proc is so slow that it causes worrisome scrape durations that ought to be fixed/changed.
So I feel like there are two kernel issues now:

  1. It's impossible to retrieve cpu metrics serially in a timely fashion
  2. If it's parallelized, we trigger bugs in at least some archs, e.g Cyclic load peaks induced by node_exporter #1963

@SuperQ
Copy link
Member Author

SuperQ commented Feb 28, 2021

Concurrent operation is not typically a problem. We're still only executing one goroutine at a time per posix thread. It's more efficient to let the kernel scheduler and goroutine scheduler to do their jobs. Much of the time spent gathering is waiting on IO, and has no impact on the system.

For example, in #1963, it doesn't actually seem to be a problem with the node_exporter, and is a perception problem with load average and the node_exporter is just showing the system problem by slightly increasing the run queue to be visible.

Really, I think the only real issue is #1880, which is some corner case where we're triggering a spinlock problem. We need to bpftrace the node_exporter's interaction with the kernel to figure out exactly what calls are triggering the race in the spinlock.

@SuperQ
Copy link
Member Author

SuperQ commented Mar 2, 2021

Since this doesn't actually fix the spinlock problem, and has other side effects, I'm going to close it. It's still an option for users to use GOMAXPROCS, so it can be up to them to use it depending on their use case.

Fixing the spinlock problem needs to be directly debugged and addressed.

@SuperQ SuperQ closed this Mar 2, 2021
@SuperQ SuperQ deleted the superq/maxproc branch March 2, 2021 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants