Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reap runner and sub-slave processes in slaves.
When running a command, the runner process correctly waits for termination of that command, but the slave also needs to wait for the runner process. This adds a set of child pids that get waitpid'd on (with WNOHANG) every time a command is read.
- Loading branch information
fed0652
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antifuchs, I was digging around Zeus a bit more and got confused by this change. Why do we want the runner waiting on its children to exit? The comment also mentions "reaping" but I don't see anything getting killed or otherwise cleaned up.
fed0652
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a hazy memory of this, but on shared machines, we'd run out of PIDs with long-running zeuses that left zombie processes around. This is because the various processes get killed by the go portion of the code as reloads happen, but the ruby processes never retrieved their children's corpses (via the nohang-ed
waitpid
there). Those stick around and continue to consume one process table entry per child, until you kill the entire zeus process tree (or theboot
entry gets restarted, which ~never happens, AIUI).