-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% nginx usage #80
Comments
@eustas Does this mean that you've officially taken over maintenance of this at Google? Great news if so. I think some other developers might have been waiting for the "official" repo to be considered active before progressing. |
Yes, finally, brotli team is able to support ngx_brotli =) Sorry for super-late response, I've thought that I've answered already, but, perhaps, that was in another issue =) |
@kapouer @HansVanEijsden @hroost does it still happen after #83? |
Currently I'm testing it on two production servers. Compiled and enabled it 15 minutes ago. Up & running with no issues, yet. I'll let it run for some hours and I'll report it back here. Thnx! |
Unfortunately, I still have the problems. After ~18 and ~30 minutes Nginx was hanging on one of the four cores with 100%. And as a consequence 25% of the web requests were hanging too, without getting any response from Nginx on that core. After 40 seconds the CPU usage goes down to normal. My Nginx error log shows this:
The 100% CPU problems appeared exactly on those timestamps. My Nginx Brotli config:
|
do you have core dumps allowed ? it can be easily achived by using
|
Yes I have, thanks for the suggestion. It's strange: no core dumps are being generated.
Nothing. Also Do you have any other suggestions? |
@chipitsine @HansVanEijsden
( |
That's good to know! |
it depends on if nginx is started with used root, it can write dumps to /root if nginx is started with another user, so another folder should be chosen |
My Nginx runs as |
Another update. Monit was running on my servers and was automatically restarting Nginx after those ~40 seconds. That was probably throwing the My syslog:
So, I disabled Monit and tried again. Nginx keeps hanging forever at 100%, without segfaults, without core dumps and without any other log entries. |
is it production environment ? if "no" (i.e. test env), can you provide an ssh access to it ? |
We have the same issue. Nginx 1.16.1 and libbrotli 1.0.7. We deployed ngx_brotli on ~150 servers (with 70k domains) and noticed 100% cpu usage on 3 servers. All servers have same configuration. We tried to find domain / request responsible (problem starts usually 1-2 hours after restart) and managed to identify domain and that last request processed by hanged process was always 404. If i try to repeat query that resulted in error it never happens, it happens on random 404 after some time. It suggest that problem is linked with 404 handling. We are using custom error pages generated with SSI using "include" directive (debug logs shows SSI processed just before brotli). Removing "include" directive from error page fixed problem on all 3 servers, 147 servers are fine with "include" directive and ngx_brotli. Error log during problem (debug log level) shows repeated:
I can provide full nginx error log if needed. |
@chipitsine I cannot reproduce it in a test environment. It needs some traffic to trigger, as far as I know. Currently I'm reproducing it on a VPS with only one website (a local news website). I managed to get an Nginx debug log while one of the processes was hanging at 100% CPU. I can't find anything special in that log though. Just the normal epoll and thread things. The only thing which I could find, right before it went to 100% CPU, was this (
I don't know if it's something useable though. Nginx 1.17.3 (and also 1.17.4, today's release) and libbrotli 1.0.7. An up-to-date Debian Stretch system.
|
@HansVanEijsden , please contact me in private (either email or skype: chipitsine@gmail.com). |
@chipitsine done! |
@HansVanEijsden 100% CPU won't result in a segfault (so those are 2 separate issues), but usually indicates that there is an infinite loop somewhere. If you run NGINX under |
Does upgrading to nginx 1.16.1 help, see |
I tried using ngx_brotli half a year ago (from eustas repo) and came across this bug with 100% cpu. Can you clarify if this resolved? |
AFAICT the bug is still in nginx 1.16.1. |
Same here. Nginx 1.17.6, brotli 1.0.7. |
I have run into this with the fancyindex module in a couple of large directories on my server. On those directories it reproduces 100% of the time with some clients and HTTP/2, and the partial truncated responses the browser receives always end at the same point regardless of specific compression settings. Firefox is the only client I've found that works with brotli compression and HTTP/2. Chromium and curl do not, even when curl is sending the same headers as Firefox. When downgrading to HTTP/1.1 (whether on the client or by disabling it on the server) all clients work in every case I can test. |
I took a stab at it but what I thought was an incompatibility between filters turned out to be a trickier bug. The filter doesn't handle the case where the input is too small to be compressed by brotli, and assumes that flushes are guaranteed to produce output. It ended up not being a simple or clean fix so I'll leave it to the maintainer. I've seen this same bug crop up in other wrappers around brotli, now that I know the cause and symptoms. The |
Thanks for the investigation. Will try to reproduce / fix when get back from vacation. |
@eustas Any news on this one? I was thinking on putting this in out server, but this is a bit scary. Thanks! |
Hi. I'm finally back =) Will work on that this week |
Added basic http2 tests. Still can not reproduce (with |
The only consistent repro I have happens to involve too much private data. Just throwing random data of a sufficient length should be enough to hit it eventually, the bug is such that any server running this on dynamic data can expect to run into it. Code inspection is enough to tell that the code does not behave properly when a flush is requested but |
Hmmm... Brotli obeys flushing, so if there was some input (after pushing the last output block) it will produce output again. On the other side, it won't produce more output, if no more input arrived, and just another flush is requested... Will try to reproduce that. Thanks. |
So, brotli should be a second level compressor then? ngx_brotli marks buffer with flush, only if the corresponding input buffer was marked so. So what nginx module could help me with simulating this? (alternatively, I could try to activate some sctipting, e.g. php, and try to force flush out there) |
Was able to setup http2 + ssl + perl in docker. Will try to setup PS:
hello.pl: package hello;
use nginx;
sub handler {
my $r = shift;
$r->send_http_header("text/html");
return OK if $r->header_only;
$r->print("head\n<br/>");
for (my $i=0; $i <= 9; $i++) {
$r->print("body\n<br/>");
$r->flush();
}
$r->print("tail\n<br/>");
return OK;
}
1;
__END__ |
(though I believe, filters below compression should not affect the workflow, as we've seen with perl; most likely, it is something above the compression; @awused what other filters / modules are configured?) |
@eustas thanks for looking into it.
System (uname -a): |
I played around with it some more. The particular location block where I have a reproduction is also running http_image_filter. If I disable that filter it changes where in the response it happens to spin but doesn't change the nature of the spin. The only other filters running are fancyindex and brotli. What I do see is that brotli is calling ngx_http_next_body_filter while holding an out_buf it believes to be of non-zero size but http_write_filter_module thinks it has been called with a chain of size 0. This is repeated in a tight loop.
I still believe there's a problem with empty output compressor flushes, but that might not be the cause of this error. |
That is interesting: |
NB: also, there is no place in the code that logs like " |
That was something I added immediately before the call to ngx_http_next_body_filter. |
Please, add a flags to that output: Tried https + h2 + fancy index. Works well on folder with 10-20-30-40-50k empty files with uuid names. |
If it is infinite loop with If it is the case, we could check what filter in the filter stack holds the buffer. If those can not be fixed, then we could request flush in situations like that, and see if this would help. |
I've done some testing and:
Setup for testing: openresty + brotli (over http 1.1). brunzip compiled in but not enabled. @eustas it sounds like you were on to something on Mar 6. Do you have anything I can test? I'd be happy to help in that way. |
Thank you @eustas I will test in the next few days (tomorrow maybe) and report back. Crazy busy at the moment (in part due to trying to fix similar issues with my ngx_brunzip) but I promise I'll get back to you ASAP. |
Pushing it to production now. Will keep you up-to-date! 👍🏻 Thanks! |
Woo-hoo =) |
Nice moment to cut a release #90 =) |
Does not fix issue. Perf result after upgrade on a spinning worker show that it's still brotli doing the spinning.
Our workload typically means output pressure as the backends are fast. |
nginx/1.17.10 with last ngx_brotli in filter mode – 3 days running without problem. |
@gzzz I can confirm also in my case: on all the servers in production (and even without restarts or reloads of nginx) continuously running without problems. |
Actually ignore my report, I incorrectly cherry-picked the commit. Everything has been good in testing for 12 hours now. |
See eustas#30
The text was updated successfully, but these errors were encountered: