-
Notifications
You must be signed in to change notification settings - Fork 653
unix: Assertion `events == 1' failed #826
Comments
Can you post the output of |
Unfortunately,I cannot reproduce this anymore. I will report back when the error is back. |
I am occasionally seeing this with node 0.10.11. The strace output should be available here: https://mega.co.nz/#!VUsC3RCY!XRDjrwj0b-Xm0Ayqz6gJ5hKpW-8ipaow9DFzCOZ4iCQ Thanks in advance, and let me know if there is anything else I can provide. |
@cmisare Can you gist the log? That website is quite annoying and doesn't work all that great in my browser. |
Apologies for that. I had run the strace with the -ff option and wasn't sure where to upload a bundle of the output files. Here is a gist of the output without the -ff: |
Thanks. Are you on a recent-ish Fedora system, per chance? Your version of strace suffers from a bug that I fixed a while ago where it doesn't process the return value from process_vm_readv() correctly. Can you clone git://strace.git.sourceforge.net/gitroot/strace/strace and run Failing that, do you have some kind of test case that I can try? |
I forget to mention what the effect of the bug is... effectively you get this:
Rather than this:
|
I am indeed on Fedora 18. Good catch. It looks like configure is gitignore'd in the strace repo, so I tried creating it using "aclocal -I m4 && autoheader && automake --add-missing && autoconf". If that is incorrect please let me know what I should do instead. I do now see lines with epoll_wait and EPOLLIN, so here is a gist of the output from running with the compiled strace: https://gist.github.com/cmisare/b72bfdc0b2655a3b208b Unfortunately, this isn't my code to share. If nothing comes out of this strace log I can try to isolate a test case to reproduce but I will need to get approval for that. Sorry. |
I am also seeing this when using The crash doesn't occur if I don't use clustering, and it also doesn't occur if I disable the logger lib we're using (ain2). The ain2 logger seems to be using UDP to write to syslogs, and removing the dgram socket send, causes it not to crash. |
I have the same error on Ubuntu 12.04 LTS x64 (it works on OS X). I have isolated the bug in a small test case: https://gist.github.com/dgrelaud/5872380 |
I discovered the cause of our (= @dule's) problem. When a file descriptor is closed, it is not removed from epoll. As described in epoll(7), there are cases in which a file description will stay registered when the file descriptor is closed. If the same fd number is reused in a future socket (specifically, a server socket), then it will receive epoll events for the old file as well, and libuv will not be able to distinguish between the two types of events, because the fds are the same. So we might have e.g. EPOLLOUT events sent to a watcher that only expects EPOLLIN events - which is what causes this assertion failure. I've created a small testcase here: https://gist.github.com/goffrie/5878871 + its strace. I also wrote a patch that fixes the problem for me ( @dgrelaud's testcase appears to hit a different problem, where |
Hi guys, I experiencing the same issue here when I'm using Node v0.10.12. Like @cmisare said, it seem to be the issue happened occasionally. I experiencing this issue when I'm using https://github.com/yeoman/generator-webapp and the bottom line is it seem the issue originating from the libuv based on the log.
Btw, here is my dev environment details when this issue happening:
If you guys need any more details, please let me know. Thanks, |
Using cluster module and forking processes, which created http server per process listening to a port, caused this issue. Specifically when you set .listen(80), you get this error on init of your cluster. A workaround was to use setTimeout with 5ms delay when using .listen(PORT_NUM) |
Does anyone have a reduced test case that triggers the assert and doesn't tamper with libuv/node.js internals? @goffrie is correct that:
But that cannot happen unless you start messing with libuv or node.js internals or call fork() yourself. Libuv doesn't dup the file descriptor by itself.
I've never seen this. The only explanation I can think of is a race between a (probably accidental) close() of the listen socket file descriptor in another thread when the event loop has entered epoll_wait(). I wouldn't expect EPOLLIN to be set in that case, though. With what kernel is this? (For context: the kernel only sets the EPOLLHUP flag when a TCP socket is in the shutdown or closed state. I'm reasonably sure that for listen sockets that only happens when someone calls close().) |
@bnoordhuis did you look at the Gist I linked? I have an all-Node test case that reliably triggers the bug. The only "internal" it uses is the fd number, which it does for expediency; we have triggered this bug in regular production code, but it only happens on one server, which isn't too surprising given that it relies on a very unlikely fd race. And the patch that I posted does fix our problem (though we're just using node 0.8 for now). |
One more thing - the cluster module does duplicate fds: the same fd being open in another process is duplicate enough for this bug. |
Here's a test case that doesn't use any internal-y stuff for its operation, using the cluster module: http://sprunge.us/FPjU |
Well, the fact that it's called
If you mean that the master sends a copy of the file descriptor to worker processes, then that's true. Libuv and node.js go to some lengths to make sure everything cooperates peacefully however. It's possible there's a bug somewhere but I'll need a little more to go on. As to your patch, I'm afraid it's not suitable for inclusion even if it fixes the issue for you. epoll_ctl() is an expensive system call; calling it indiscriminately has a pretty severe performance impact. Compare the libuv benchmark numbers with and without your patch. EDIT: That said, keeping track of whether the file descriptor has been shared and deregistering based on that might be an option. That won't fix all cases though. |
Right, I see what you mean. That's fixed in master actually. It's probably a bug in the v0.10 cluster module. I'll look into it. |
https://gist.github.com/goffrie/6195465. I'm not seeing any significant changes.
I think this could be reasonable.
Thanks for looking :) By the way, http://sprunge.us/OZJh is another incarnation of the same bug, that is still failing on the latest version of node (0.11.5). Also, in the original cluster test case, even though the assertion failure doesn't occur, strace output still has thousands of lines like this:
so there's still a problem there. |
I just ran across this same issue in node 0.8.25.
|
Worth saying that the process consumes 100% CPU in this state. |
When fd is closed and new one (with the same number) is opened inside kqueue/epoll/port loop's callback - stale events might invoke callbacks on wrong watchers. Check if watcher was changed after invocation and invalidate all events with the same fd. fix #826
Seeing this in node
|
@thefosk could you please try running |
Same problem here unfortunately.
Append every-time after a couple of hour :/ |
This is very very odd... What I do see from your log is:
and then:
Are you sure that are kernel is up-to-date on your server? |
Also note these:
Are you sending handles over pipes? |
i.e. are you using cluster? |
No cluster. |
Thank you! |
node: ../deps/uv/src/unix/stream.c:494: uv__server_io: Assertion `events == 1' failed.
Im on node 0.10.10 and therefore livuv 0.10.10.
This happens when node-phantom tries to spawn phantom's process.
On Debian 7.0
The text was updated successfully, but these errors were encountered: