-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle defunct processes in child processes? #34
Comments
This is indeed a situation that isn't handled by MPIRE right now. When a child process dies I would expect the other children to die as well and the main process to throw an exception. Do you have any suggestions for handling this? I guess in Linux you can set up a signal handler (https://stackoverflow.com/questions/3675675/how-do-i-know-when-a-child-process-died). We would need to check how to do this on Windows. |
The use of signal controllers may be an improvement, but there are still many remain issues, for example when using kill -9 to kill a subprocess, the process will not receive a term signal. Perhaps we could add a heartbeat mechanism between the parent and child processes, such as a heartbeat signal sent by the child process to the parent process every second to ensure that the child process is alive. If the heartbeat stops, it can be assumed to have exited abnormally, so that the parent process can be terminated and an exception thrown. |
Sounds like a good option to me. I was thinking about how the implementation would look like. We could set an Event (or boolean value) for each worker and check that on regular intervals. The main process would then reset the Event again and if the Event wasn't set it would raise. The only thing to keep in mind is that the timing for checking these values should be correct. You could let the workers update the event each 0.1 seconds in a separate thread and in the main process only check it every second or so. In that case, you're pretty sure that the workers had the chance to set the value to True again. |
Yes, that's exactly what I was thinking. It's better to maintain this state only in the main process and don't forget to stop those threads in the child process. |
I'm not sure heartbeats would work very well. Because a worker may not be able to send a heartbeat in time if it's stuck in a long-running operation that doesn't release the GIL (either in cpython, or an external C module). |
You're right @towr . I experienced this already when I tried to implement it. I already have a different approach which does seem to work, at least 99% of the time. Still working on that 1%. I'm using |
Another request is to provide another mode of automatic recovery when a child process dead (restarting the dead child process), so that we can implement a supervisor mode similar to erlang, which is a bit more practical than throwing exceptions. |
@sailxjx Changing this in mpire would require adding quite a few assumptions on how the end user would want to handle it. You can catch the exception that is thrown and restart it if you need it to manually. If there's more interest in changing this, I will reconsider. For now, I'll leave it to the end user. |
New v2.3.5 release is now available that deals with defunct processes |
This issue is not brought up by mpire, but I thought I'd discuss how to deal with it or improve it here.
When I start multiple child processes with the Pool module and call exit() in one of them, or just use the kill command to kill the child process, it will become a defunct process and cause the parent process to fail to exit.
The code to reproduce this problem is very simple:
So how to make the parent process exit normally in this case?
The text was updated successfully, but these errors were encountered: