-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I/O operation on closed file #979
Comments
Thanks for this report. There is a pretty big changeset between 0.11.1 and 0.13, I guess it won't be easy to find the guilty change. Maybe it has something to do with this PR: #885 ? |
yep #933 seems to be linked too |
some progress here: it's the same bug than #933 with "circusctl ipython" when the problem occurred I see: # stage2 is the watcher object
In [30]: stage2.status()
Out[30]: 'active'
In [31]: stage2.processes
Out[31]: {}
In [32]: stage2.numprocesses
Out[32]: 1
In [33]: stage2.respawn
Out[33]: True more interesting (linked with #933): In [22]: stage2.stream_redirector
Out[22]: <circus.stream.redirector.Redirector at 0x7f2343504f90>
In [23]: stage2.stream_redirector.pipes
Out[23]:
{254: ('stdout',
<circus.process.Process at 0x7f2343504190>,
<closed file '<fdopen>', mode 'rb' at 0x7f2341e7c420>)}
In [24]: stage2.stream_redirector._active
Out[24]: {254: <circus.stream.redirector.Handler at 0x7f2343504f10>}
In [25]: stage2.stream_redirector.running
Out[25]: True
In [26]: stage2.stream_redirector.redirect
Out[26]:
{'stderr': <circus.stream.file_stream.FileStream at 0x7f234378a810>,
'stdout': <circus.stream.file_stream.FileStream at 0x7f234378a7d0>}
In [27]: stage2.stream_redirector.pipes[254][2].fileno()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/synext/lib/python2.7/site-packages/circus-0.13.0-py2.7.egg/circus/commands/ipythonshell.pyc in <module>()
----> 1 stage2.stream_redirector.pipes[254][2].fileno()
ValueError: I/O operation on closed file So in this particular case, we have a closed "stream_redirector" so the watcher can't spawn another process because of "stream_redirector.start()" call wich fails at "pipe.fileno()". But I don't understand the root cause. The problem seems to occur when the managed process exit by itself. But it's not reproductible easily. So there is probably a concurrency issue here. Another idea ? |
Note: when everything is ok, I have two "stream_redirector" for this watcher (one for stdout and one for stderr). The last action for the watcher seems to be "reaping process xxxxx" (in logs) |
Maybe we are ignoring errors when closing the |
I still don't understand :-/ Maybe a kind of race conditions between file descriptors of different watchers. Closed in one watcher and reused by another one at the OS level and circus is out of sync at the stream_redirector level and close the bad one ? I'm lost :-( |
avoir the main and blocking exception about "I/O operation on closed file" with minimal changes and risks
some progress with the references commit but after a few hours with this commit, I have only one exception in the log file:
But our processes are alive :-) I still don't have the root cause (probably a concurrency issue between process reaping and process killing (after lifetime) but there is still something nasty). The referenced commit just limit the consequences. Note: #933 is a duplicate of this one because after fixing a logging configuration issue in a circus hook, I have the same exception. |
I fixed a similar issue with the streams in the PR #984. I'll let you know when it's merged. |
avoir the main and blocking exception about "I/O operation on closed file" with minimal changes and risks
At Meteo-France, we were happy with circus 0.11.1.
We upgraded recently on 0.13.0
And we have a nasty bug very difficult to reproduce.
Context:
After a few hours, some watchers are in bad state:
I'm still debugging this one. But I think that there is a hidden exception somewhere (probably around stdout/stderr management to explain "I/O operation on closed file"). Probably a kind of concurrency issue.
If someone has ideas to share :-)
Thanks
The text was updated successfully, but these errors were encountered: