Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redash doesn't start if CPU is too slow #1866

Closed
unixwitch opened this issue Jul 4, 2017 · 8 comments
Closed

Redash doesn't start if CPU is too slow #1866

unixwitch opened this issue Jul 4, 2017 · 8 comments

Comments

@unixwitch
Copy link
Contributor

Issue Summary

Redash attempts to compile something (I'm not sure what) during startup. If the CPU isn't fast enough, this takes so long that Gunicorn times out the workers:

[2017-07-04 22:51:53 +0000] [599] [INFO] Booting worker with pid: 599
[2017-07-04 22:51:53 +0000] [600] [INFO] Booting worker with pid: 600
[2017-07-04 22:51:53 +0000] [605] [INFO] Booting worker with pid: 605
[2017-07-04 22:51:53 +0000] [606] [INFO] Booting worker with pid: 606
[2017-07-04 22:52:14,104][PID:599][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-07-04 22:52:14,407][PID:599][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2017-07-04 22:52:15,398][PID:600][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-07-04 22:52:16,397][PID:600][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2017-07-04 22:52:17,405][PID:606][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-07-04 22:52:17,998][PID:606][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2017-07-04 22:52:19,700][PID:605][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-07-04 22:52:20,099][PID:605][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2017-07-04 22:52:23 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:599)
[2017-07-04 22:52:23 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:600)
[2017-07-04 22:52:23 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:605)
[2017-07-04 22:52:23 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:606)

This repeats in an endless loop and it never manages to start.

Steps to Reproduce

  1. Start Redash on a slow CPU or, in our case, by artificially limiting its CPU use; I noticed this when Redash was limited to 20% of a 2.20GHz CPU.
  2. Observe Redash sitting in a restart loop and never serving requests.

Any other info e.g. Why do you consider this to be a bug? What did you expect to happen instead?

This may seem like a "well don't do that then" issue, but we generally limit applications' CPU use based on how much they typically need while running, to avoid a broken application slowing down everything else. I don't mind Redash being a bit slow to start as long as it actually does.

Technical details:

  • Redash Version: 1.0.3.b2896
  • Browser/OS: Linux
  • How did you install Redash: Docker
@arikfr
Copy link
Member

arikfr commented Jul 8, 2017

Redash attempts to compile something

Can you provide more details on this -- I mean, how did you deduce it tries to compile something?
Did you see a forked process that starts gcc?

@unixwitch
Copy link
Contributor Author

unixwitch commented Jul 8, 2017

how did you deduce it tries to compile something

I saw gcc in the process list.

To reproduce, start an instance of redash:

$ docker run --rm -ti --name=redash redash/redash:1.0.3.b2896 server

In another terminal, look at the container's process list:

$ docker exec -ti redash ps auxww
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
redash        1 15.0  0.4  47496 17872 ?        Ss+  20:50   0:00 /usr/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:5000 --name redash -w4 redash.wsgi:app
redash       12  0.0  1.2 130796 51448 ?        S+   20:50   0:00 /usr/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:5000 --name redash -w4 redash.wsgi:app
redash       15  0.0  1.2 126688 51348 ?        S+   20:50   0:00 /usr/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:5000 --name redash -w4 redash.wsgi:app
redash       16  0.0  1.2 120132 51144 ?        R+   20:50   0:00 /usr/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:5000 --name redash -w4 redash.wsgi:app
redash       21  0.0  1.2 118084 51136 ?        R+   20:50   0:00 /usr/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:5000 --name redash -w4 redash.wsgi:app
redash      206  0.0  0.0   4508   756 ?        S+   20:50   0:00 sh -c if type gcc >/dev/null 2>&1; then CC=gcc; elif type cc >/dev/null 2>&1; then CC=cc;else exit 10; fi;LANG=C LC_ALL=C $CC -Wl,-t -o /tmp/tmpW6kUAI 2>&1 -l/usr/local/lib/python2.7/dist-packages/Crypto/Util/../Cipher/_raw_ocb.x86_64-linux-gnu.so
redash      208  0.0  0.0  34424  2820 ?        Rs+  20:50   0:00 ps auxww
redash      214  0.0  0.0   5320   708 ?        S+   20:50   0:00 gcc -Wl,-t -o /tmp/tmpW6kUAI -l/usr/local/lib/python2.7/dist-packages/Crypto/Util/../Cipher/_raw_ocb.x86_64-linux-gnu.so
redash      215  0.0  0.0   4896   832 ?        S+   20:50   0:00 /usr/lib/gcc/x86_64-linux-gnu/5/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/5/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper -plugin-opt=-fresolution=/tmp/ccmBezED.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -z relro -o /tmp/tmpW6kUAI /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/5/../../.. -t -l/usr/local/lib/python2.7/dist-packages/Crypto/Util/../Cipher/_raw_ocb.x86_64-linux-gnu.so -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/5/crtend.o /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crtn.o
redash      216  0.0  0.0   4508   744 ?        R+   20:50   0:00 sh -c if type gcc >/dev/null 2>&1; then CC=gcc; elif type cc >/dev/null 2>&1; then CC=cc;else exit 10; fi;LANG=C LC_ALL=C $CC -Wl,-t -o /tmp/tmpu0G_kN 2>&1 -l/usr/local/lib/python2.7/dist-packages/Crypto/Util/../Protocol/_scrypt.x86_64-linux-gnu.so
redash      217  0.0  0.0      0     0 ?        Z+   20:50   0:00 [gcc] <defunct>
redash      221  0.0  0.1  16948  5112 ?        R+   20:50   0:00 /usr/bin/ld -plugin /usr/lib/gcc/x86_64-linux-gnu/5/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper -plugin-opt=-fresolution=/tmp/ccmBezED.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -z relro -o /tmp/tmpW6kUAI /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/5/../../.. -t -l/usr/local/lib/python2.7/dist-packages/Crypto/Util/../Cipher/_raw_ocb.x86_64-linux-gnu.so -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/5/crtend.o /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crtn.o

(This may take a few tries depending on the timing.)

Using ps -eaf instead of ps auxww indicates that the parent of the sh -c if type gcc ... command is the Gunicorn worker.

@unixwitch
Copy link
Contributor Author

Actually, it's easier to reproduce it if you start it like this:

$ docker run --rm -ti --cpus=0.1 --name=redash redash/redash:1.0.3.b2896 server

This limits it to 10% of one CPU. On my development environment (a Linux Hyper-V VM on an Intel i7-3770K 3.5GHz CPU) this is sufficient to make the gcc processes take over 30 seconds to complete, so it's easy to see them in docker exec ... ps. After 30 seconds it will then reproduce the original issue:

[2017-07-08 21:00:02 +0000] [1] [INFO] Starting gunicorn 19.4.5
[2017-07-08 21:00:02 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2017-07-08 21:00:02 +0000] [1] [INFO] Using worker: sync
[2017-07-08 21:00:02 +0000] [11] [INFO] Booting worker with pid: 11
[2017-07-08 21:00:03 +0000] [15] [INFO] Booting worker with pid: 15
[2017-07-08 21:00:03 +0000] [14] [INFO] Booting worker with pid: 14
[2017-07-08 21:00:03 +0000] [16] [INFO] Booting worker with pid: 16
[2017-07-08 21:00:33 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:16)
[2017-07-08 21:00:33 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:11)
[2017-07-08 21:00:33 +0000] [11] [INFO] Worker exiting (pid: 11)
[2017-07-08 21:00:33 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:14)
[2017-07-08 21:00:33 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:15)

@arikfr
Copy link
Member

arikfr commented Jul 8, 2017

Thanks! I recently started seeing similar symptoms in my environment, and wanted to confirm we're talking about the same thing.

I'm now trying to figure out what causes this and whether it can be avoided. Running gcc in runtime feels crazy. 😮

@unixwitch
Copy link
Contributor Author

This is coming from pycryptodome, and I'm fairly sure it's related to how it uses CFFI at runtime. Unfortunately, I know almost nothing about cffi so I don't know why it would need to invoke gcc. (The cffi documentation does mention it would do this in "out-of-line API mode".)

Perhaps it's sufficient to import the relevant pycryotodome modules in Dockerfile during build, so it pre-compiles them? That would obviously only fix it for Docker though; I don't know if this problem happens on non-Docker installations as well.

@deecay
Copy link
Contributor

deecay commented Jul 9, 2017

Have you tried a longer gunicorn timeout like this in bin/docker-entrypoint?

server() {
  exec /usr/local/bin/gunicorn -b 0.0.0.0:5000 --name redash -w4 redash.wsgi:app -t 120
}

@arikfr
Copy link
Member

arikfr commented Jul 9, 2017

@deecay while extending the timeout might help with this specific issue, there is no reason Redash should take so long to boot.

@unixwitch the Snowflake query runner was bringing in pycryptodome, and causing this. Just disabling the Snowflake query runner isn't enough to resolve this, and you will need to remove its library from the requirements_all_ds.txt list. I think I will remove it entirely in future releases, as most users don't use it so not worth the performance penalty it causes.

Btw, how did you find it's pycryptodome?

Perhaps it's sufficient to import the relevant pycryotodome modules in Dockerfile during build, so it pre-compiles them?

From what I've seen, it runs this step on every run of the code and doesn't cache this compilation step :-(

@arikfr
Copy link
Member

arikfr commented Aug 6, 2017

#1912.

@arikfr arikfr closed this as completed Aug 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants