Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyzmq 17.0.0 can cause circus to busy loop on epoll_wait #1055

Closed
chris-bornmann opened this issue Feb 26, 2018 · 5 comments
Closed

pyzmq 17.0.0 can cause circus to busy loop on epoll_wait #1055

chris-bornmann opened this issue Feb 26, 2018 · 5 comments

Comments

@chris-bornmann
Copy link

chris-bornmann commented Feb 26, 2018

We run circus in a docker container. We rebuilt our containers which pulled in a new version of pyzmq. It went from 16.0.3 to 17.0.0 and it seems like an incompatibility has crept in, as circusd's CPU utilization went to 100%. strace shows:

...
     0.000057 epoll_wait(5<anon_inode:[eventpoll]>, [{EPOLLHUP, {u32=26, u64=21436748710019098}}], 1023, 1750) = 1
     0.000074 clock_gettime(CLOCK_REALTIME, {1519664951, 681411890}) = 0
     0.000056 clock_gettime(CLOCK_REALTIME, {1519664951, 681467241}) = 0
     0.000055 clock_gettime(CLOCK_MONOTONIC, {13571888, 686821970}) = 0
     0.000051 epoll_wait(5<anon_inode:[eventpoll]>, [{EPOLLHUP, {u32=26, u64=21436748710019098}}], 1023, 1749) = 1
     0.000089 clock_gettime(CLOCK_REALTIME, {1519664951, 681662917}) = 0
     0.000042 clock_gettime(CLOCK_REALTIME, {1519664951, 681697351}) = 0
...

The fd for epoll_wait is a pipe:

ls -lrtah /proc/97/fd/26
lr-x------ 1 root root 64 Feb 26 17:15 /proc/97/fd/26 -> pipe:[1327394133]

It seems like there are three circusd threads and only one of them is spinning (this is from "top"):

   97 root      20   0  219164  27908  11364 R 99.9  0.3  56:36.67 circusd
  103 root      20   0  219164  27908  11364 S  0.0  0.3   0:00.00 circusd
  104 root      20   0  219164  27908  11364 S  0.0  0.3   0:00.00 circusd

And here's output from lsof:

lsof | grep 1327394133
COMMAND    PID TID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
circusd     97     root   26r     FIFO               0,10      0t0 1327394133 pipe
circusd     97 103 root   26r     FIFO               0,10      0t0 1327394133 pipe
circusd     97 104 root   26r     FIFO               0,10      0t0 1327394133 pipe

Backing off to pyzmq 16.0.4 (or 16.0.3) fixes the issue. However! We have multiple containers running apps that use circus, and all upgraded to pyzmq 17.0.0 yet only a subset are exhibiting this behavior. I think pyzmq is involved only because pinning it to an older version makes the problem go away. I haven't yet figured out what the difference is between a system that works with 17.0.0 and one that doesn't. The other dependencies of circus (iowait, psutil, tornado) are the same on all systems:

  - iowait [required: Any, installed: 0.2]
  - psutil [required: Any, installed: 5.4.3]
  - pyzmq [required: >=13.1.0, installed: 17.0.0]
  - tornado [required: >=3.0, installed: 4.5.3]

If there's any additional information I can gather please let me know. This could absolutely be a pyzmq issue but I'm starting here in case they changed something that requires a change in circusd.

@k4nar
Copy link
Contributor

k4nar commented Feb 27, 2018

Thanks for this thorough report!

Reading the Changelog of PyZMQ, I see that there was some major changes in regarding the Tornado EventLoop. It caused some issues in IPython for example: ipython/ipykernel#307

If someone want to do a PR to fix that it would be very nice. Else, we can put pyzmq<17 in our setup.py, but I don't like that…

@sboisson
Copy link
Contributor

ZMQPoller is deprecated in PyZMQ 17, which cause many circus unit tests to fail..

@thefab
Copy link
Contributor

thefab commented May 4, 2018

same problem here, reverting to pyzmq 16.0.4 fixed the issue

@k4nar
Copy link
Contributor

k4nar commented Jun 15, 2018

We've set the version of pyzmq to < 17 in the latest release. But let's keep that issue to remember to update someday.

@ltalirz
Copy link
Contributor

ltalirz commented Jan 17, 2020

Note that the latest circus version supports pyzmq 17+ https://github.com/circus-tent/circus/releases

I think this issue can be closed

@k4nar k4nar closed this as completed Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants