Anormal CPU consumption #2977

NicolasDorier · 2019-08-20T16:35:38Z

When I develop for BTCPayServer, I use a docker-compose which run all the dependencies needed for a comfortable dev environment on regtest. (on 0.7.2)

In this docker-compose I have two clightning: A and B where A open channel to B and send payment to B. (never the reverse)

I noticed that A was taking 100% of my CPU while being idle. Everything seemed to works fine, just that the CPU was strangely high.

I have not yet tried to update to 0.7.2 in production, so I don't know yet if this is a general problem, if this is specific to 0.7.2, or specific to regtest and my own setup.

I thought reporting it in case somebody could reproduce.

The text was updated successfully, but these errors were encountered:

NicolasDorier · 2019-08-21T06:11:21Z

Ok it did not happened to me only.

btcpayserver/btcpayserver#978

I have good tracing tool, will eventually figure out the issue, but there is a bug somewhere in 0.7.2

NicolasDorier · 2019-08-21T06:37:04Z

@rustyrussell @cdecker @ZmnSCPxj here are some flamegraph I took:

https://aois.blob.core.windows.net/public/1.svg

https://aois.blob.core.windows.net/public/2.svg

NicolasDorier · 2019-08-21T06:39:32Z

I tested again with 0.7.1, no problem. In prod, 0.7.2 does not show this problem, seems to be only regtest 0.7.1.

ZmnSCPxj · 2019-08-21T07:58:33Z

Can you also make flamegraph for 0.7.1? Thanks.

The flamegraphs for 0.7.2 seem to blame something called _arch_local_irq_enable, I am uncertain if this is relevant or not.

NicolasDorier · 2019-08-21T08:11:19Z

@ZmnSCPxj it appears relevant https://aois.blob.core.windows.net/public/3.svg

Note that on 3.svg, most of the traces are inside bitcoin-cli, not inside lightningd, because it's mainly idle.

ZmnSCPxj · 2019-08-21T08:19:50Z

Ouch. I have no idea what _arch_local_irq_enable is, it is nowhere in our codebase. Looks like digging through the libc and/or kernel is in order ....

NicolasDorier · 2019-08-21T08:27:04Z

@ZmnSCPxj unsure if it can help you our dockerfile is the same as yours outside stuff for tracing https://github.com/btcpayserver/lightning/blob/master/Dockerfile

NicolasDorier · 2019-08-21T08:30:21Z

Might it be the fact we run on alpine... but strange the problem was not here before. And no idea about __clone ?

ZmnSCPxj · 2019-08-21T08:47:30Z

__clone either means fork or pthread_create, and since we mostly use fork (and we use it a lot) it is likely just fork. Unless some library we use actually uses pthread_create underneath. And since __clone is likely to reset the stack for the new process that is likely the reason why you do not see it attributed properly as to who called the fork, because the information is erased by __clone clearing the stack.

I think there was a recent thing where we had to close fds when starting plugins, lemme try to dig that...

ZmnSCPxj · 2019-08-21T09:04:36Z

Could you check before and after this commit to see if this introduced the slowdown? 25ddb80

If not then maybe bisect from there, haha...

NicolasDorier · 2019-08-21T13:22:58Z

@ZmnSCPxj or maybe I can just remove stack clearing with some magic compile flag?

NicolasDorier · 2019-08-21T13:31:53Z

testing without the commit

NicolasDorier · 2019-08-21T13:42:56Z

@ZmnSCPxj that's an incredible nose for smelling bugs that you have. Reverting this commit fix the issue. But why I can't see a problem in prod? Maybe because my node has no connection.

ZmnSCPxj · 2019-08-21T21:46:02Z

@ZmnSCPxj or maybe I can just remove stack clearing with some magic compile flag?

No, clone is a Linux kernel thing. You have to modify the kernel, not just C-Lightning.

@ZmnSCPxj that's an incredible nose for smelling bugs that you have.

Thank you very much. Principle is new code is always suspicious.

The other clue is that the flamegraph partially blames close from within the __clone call. And the commit in question adds a loop of close calls after fork but before exec.

Reverting this commit fix the issue. But why I can't see a problem in prod? Maybe because my node has no connection.

I am unsure. The code is correct functionally. Time to bring in @rustyrussell .

It could be that we are now doing The Right Thing but we erroneously did not do this before. Maybe the extra CPU load is fine?

NicolasDorier · 2019-08-22T02:33:13Z

I have gone back to 0.7.1 for our tests. I keep 0.7.2 in production, will report if user complains.

NicolasDorier · 2019-08-22T02:35:27Z

Will check the value of _SC_OPEN_MAX

NicolasDorier · 2019-08-22T03:04:00Z

so I tried to debug with printf by logging before and after the call.

Sadly nothing get printed. :(
Let me know if I can do something to help debug this.

NicolasDorier · 2019-08-26T15:12:33Z

Does the wizard can take a look at the voodoo? or let me know anything I can do to help?

ZmnSCPxj · 2019-08-27T14:04:49Z

Get me a replacement for my day job?

Sorry, this is some rather low-level OS stuff. The code added in the commit is correct, from what I know of Linux and POSIX. The extra CPU load might be acceptable. We also do the same code when starting a subdaemon. The difference is now we do the Right Thing (TM) on a bunch of other calls, including bitcoin-cli, which I suspect is the main cause in your particular case. Ping @rustyrussell again as he has more OS-level knowledge I suspect.

NicolasDorier · 2019-08-28T02:44:27Z

mmh maybe another issue might be that the docker is on musl (fetching glib from another source).
I vaguely remember complains about musl on some low performance of some low level stuff like this.

Might try to switch to another base image.

ZmnSCPxj · 2019-08-28T07:19:18Z

The added loop is a loop on calls to close, and I find it hard to imagine that such a simple syscall could have additional overhead from the C library. Still, it does not hurt to try.

We also now call into bitcoin-cli more often, and I also noticed that we recently greatly increased our running time in our Travis tests, so it seems to me likely, that these changes in total greatly increased our run time for calling into bitcoin-cli (more calls into bitcoin-cli, and more overhead per call due to closeing unused fds before calling into bitcoin-cli). So it might not be just specific to your environment.

NicolasDorier · 2019-08-29T12:56:14Z

I tried moving the docker from alpine to stretch. Same issue.

cdecker · 2019-09-24T13:52:02Z

I finally took a look at the issue, and it seems like _SC_OPEN_MAX in docker images is huge:

#include <stdio.h>
#include <unistd.h>
int main() {
  int max = sysconf(_SC_OPEN_MAX);
  printf("_SC_OPEN_MAX %d\n", max);
  return 0;
}

This returns the following:

Context	Value
Host (User)	1024
Host (root)	1024
Docker (root)	1048576
Docker (user)	1024

So no wonder that got a lot slower, we are issuing 1024x syscalls (close()) on each subprocess (in particular calling bitcoind for polling). So my suggested quickfix would be not to run as root in docker (might be a good idea anyway).

@NicolasDorier would that be an option?

NicolasDorier · 2019-09-25T11:50:50Z

mmh I don't know, I can try. The big problem docker and permission is that images which use non-root users can't share data via volume sharing with other containers.

The c-lightning container is sharing different directory:

The tor hidden_service directory to get the onion address
The bitcoin data directory to get the cookiefile for authenticating on RPC
The lightning-rpc that btcpay container is using

Tor and Bitcoin containers are using their own non root user with different UID, and btcpay container is using root user.

I don't want to make change to several different images to fix this situation, as this can be a can of worms. Fixing one image which also share data might make another image fail, which need me to fix etc... and I am myself not maintainer of some images.

I can probably work around 2. by not relying on cookie auth for rpc authentication, and use a fixed password to RPC (we do this when the user select LND because LND cache the cookiefile value, and end up with bad data in cache if bitcoind reboot), I don't like this because all bitcoin RPC in BTCPay will have the same password.... We don't expose it on internet, but still I don't like it.

I think 3. should not be a problem, because this btcpay container runs as root, it will still have access to lightning-rpc.

For 1. I can fix it by using the same UID between Tor container and c-lightning.

On top of this I will need to change the permissions of all files in the c-lightning data directory, because existing user will have the files owned by root. Maybe it is easy, maybe not. I can try...

NicolasDorier · 2019-09-25T13:09:19Z

Frankly for all those problem I wonder if it is not just easier to cap the max variable to max 1024

cdecker · 2019-09-25T16:37:28Z

Frankly for all those problem I wonder if it is not just easier to cap the max variable to max 1024

That wouldn't really work: though unlikely, a process might open >1024 fds, since running under root allows us to do so, and then only closing the first 1024 would leave the remainder lingering.

So far I only found one way that this works in docker reliably: run the container with --ulimit nofile=1024. This is also supported in docker-compose files. I'm still trying to see if there is a way to build this limit into the docker image itself, since having everybody add that will lead to some confusion, but I think for your use-case that is a solution.

We should never open more than 1024 file descriptors anyway, and under some situations, namely running as root or in docker, would give us huge allowances. This then results in a huge, unneeded, cleanup for subprocesses, which we use a lot. Fixes ElementsProject#2977

cdecker · 2019-09-25T18:20:15Z

Ok, I found another variant that might work: limit ourselves to only ever open up to 1024 fds, so we don't have to close() anything beyond that range.

We should never open more than 1024 file descriptors anyway, and under some situations, namely running as root or in docker, would give us huge allowances. This then results in a huge, unneeded, cleanup for subprocesses, which we use a lot. Fixes #2977

NicolasDorier · 2019-09-26T01:47:28Z

Thanks a lot!

We should never open more than 1024 file descriptors anyway, and under some situations, namely running as root or in docker, would give us huge allowances. This then results in a huge, unneeded, cleanup for subprocesses, which we use a lot. Fixes ElementsProject#2977

whitslack · 2021-10-17T01:39:22Z

The implemented workaround for this issue caused my node to break as it neared 1024 open channels. (Reported in #4868.)

Rather than arbitrarily clamping down the limit on open file descriptors, couldn't the syscall slowdown have been avoided by making use of the close_range syscall instead of hammering on close in a loop? close_range is somewhat new (first appeared in Linux 5.9), but the code can fall back to the existing (slow) loop in the case that close_range returns ENOSYS.

ZmnSCPxj · 2021-10-17T10:34:49Z

We also support running on BSDs and MacOS, so we need to detect if we are being compiled targeting Linux. Probably a good idea to see if a similar syscall or function exists on those OSs as well.

also grats on 1024 chans

whitslack · 2021-10-17T17:36:20Z

close_range does exist on FreeBSD as well, albeit with no flags defined.

I don't know about Mac.

ZmnSCPxj assigned rustyrussell Aug 22, 2019

ZmnSCPxj added the voodoo Suspect user, issue or code has ancient curse label Aug 22, 2019

NicolasDorier mentioned this issue Aug 29, 2019

[Docker] Use stretch-slim instead of Alpine #3011

Merged

cdecker mentioned this issue Sep 25, 2019

Set the rlimit for NOFILE to 1024 to reduce the subprocess cleanup #3101

Merged

rustyrussell closed this as completed in #3101 Sep 25, 2019

ZmnSCPxj mentioned this issue Oct 19, 2021

Use closefrom and remove our fds max limit. #4872

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anormal CPU consumption #2977

Anormal CPU consumption #2977

NicolasDorier commented Aug 20, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019 •

edited

Loading

ZmnSCPxj commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 •

edited

Loading

ZmnSCPxj commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019

ZmnSCPxj commented Aug 21, 2019 •

edited

Loading

ZmnSCPxj commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 •

edited

Loading

ZmnSCPxj commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 22, 2019

NicolasDorier commented Aug 22, 2019

NicolasDorier commented Aug 22, 2019

NicolasDorier commented Aug 26, 2019

ZmnSCPxj commented Aug 27, 2019

NicolasDorier commented Aug 28, 2019 •

edited

Loading

ZmnSCPxj commented Aug 28, 2019

NicolasDorier commented Aug 29, 2019

cdecker commented Sep 24, 2019

NicolasDorier commented Sep 25, 2019 •

edited

Loading

NicolasDorier commented Sep 25, 2019

cdecker commented Sep 25, 2019

cdecker commented Sep 25, 2019

NicolasDorier commented Sep 26, 2019

whitslack commented Oct 17, 2021

ZmnSCPxj commented Oct 17, 2021

whitslack commented Oct 17, 2021

Anormal CPU consumption #2977

Anormal CPU consumption #2977

Comments

NicolasDorier commented Aug 20, 2019 • edited Loading

NicolasDorier commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 • edited Loading

NicolasDorier commented Aug 21, 2019 • edited Loading

ZmnSCPxj commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 • edited Loading

ZmnSCPxj commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 • edited Loading

NicolasDorier commented Aug 21, 2019

ZmnSCPxj commented Aug 21, 2019 • edited Loading

ZmnSCPxj commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019

NicolasDorier commented Aug 21, 2019 • edited Loading

ZmnSCPxj commented Aug 21, 2019 • edited Loading

NicolasDorier commented Aug 22, 2019

NicolasDorier commented Aug 22, 2019

NicolasDorier commented Aug 22, 2019

NicolasDorier commented Aug 26, 2019

ZmnSCPxj commented Aug 27, 2019

NicolasDorier commented Aug 28, 2019 • edited Loading

ZmnSCPxj commented Aug 28, 2019

NicolasDorier commented Aug 29, 2019

cdecker commented Sep 24, 2019

NicolasDorier commented Sep 25, 2019 • edited Loading

NicolasDorier commented Sep 25, 2019

cdecker commented Sep 25, 2019

cdecker commented Sep 25, 2019

NicolasDorier commented Sep 26, 2019

whitslack commented Oct 17, 2021

ZmnSCPxj commented Oct 17, 2021

whitslack commented Oct 17, 2021

NicolasDorier commented Aug 20, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019 •

edited

Loading

ZmnSCPxj commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 21, 2019 •

edited

Loading

ZmnSCPxj commented Aug 21, 2019 •

edited

Loading

NicolasDorier commented Aug 28, 2019 •

edited

Loading

NicolasDorier commented Sep 25, 2019 •

edited

Loading