-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the interaction of signals in GAP and the IO package #1851
Conversation
The IO part of this is at: gap-packages/io#55 (just for reference). |
/* Close down the child */ | ||
int status; | ||
int retcode = close(PtyIOStreams[pty].ptyFD); | ||
if (retcode) | ||
Pr("Strange close return code %d\n", retcode, 0); | ||
kill(PtyIOStreams[pty].childPID, SIGTERM); | ||
// GAP (or another library) might wait on this PID before | ||
// we handle it. If that happens, waitpid will return -1. | ||
retcode = waitpid(PtyIOStreams[pty].childPID, &status, WNOHANG); | ||
if (retcode == 0) { | ||
// Give process a second to quit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this wasn't added in this PR (but the previous one which I didn't review), but the one second seems rather arbitrary (if usually enough for a process to quit). Does it make sense to have this waiting time configurable, or at least definable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit it's arbitary, but I'm not sure I'd want to add a configuration option people might not understand.
In general, if an app hasn't quite within a second of being sent a SIGTERM
, then it's a fairly badly behaved program.
One option would be to wait less long before going back to see if the program has stopped, maybe waiting 0.1 seconds 10 times, so GAP can continue faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you bring it up, I'm tempted to make the timeout much longer (say 10 seconds), and check more often. That should deal with almost every case in practice, and make sure GAP keeps working for particularly badly behaved subproblems.
We would never have found this if there wasn't buggy copies of rev
floating around which swallow all signals and enter an infinite loop.
Codecov Report
@@ Coverage Diff @@
## master #1851 +/- ##
==========================================
+ Coverage 63.63% 65.93% +2.29%
==========================================
Files 946 513 -433
Lines 283525 268703 -14822
Branches 12686 11558 -1128
==========================================
- Hits 180426 177166 -3260
+ Misses 100313 88819 -11494
+ Partials 2786 2718 -68
|
|
||
# If IO is loaded, disable it's signal handler | ||
gap> if IsBoundGlobal("IO_RestoreSIGCHLDHandler") then | ||
> ValueGlobal("IO_RestoreSIGCHLDHandler")(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that's just for the current io
version, until gap-packages/io#55 is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you comment this out, the test fails (it "should" fail, the manual explictly says that GAP's process stuff can't be used if IO's signal handler is enabled, it just got more likely to trigger).
a77cd65
to
c0380c3
Compare
Hopefully all issues fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third commit message starts with "xAdd function" -> remove x, and also fix that comment typo?
Otherwise looks good (and once it's merge, I'll look into update io
)
gap> scriptdir := DirectoriesLibrary( "tst/teststandard/processes/" );; | ||
gap> checkpl := Filename(scriptdir, "check.pl");; | ||
|
||
# If IO is loaded, disable it's signal handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it's" -> "its"
This patch ensures that if we catch the child dying in the signal handler, it is correctly marked as finished. This change makes it easier to keep track of child processes, rather than ignoring signals we are not expecting.
c0380c3
to
cd4cf85
Compare
This patch is part 1 of fixing the interaction of signals in GAP and the IO package.
This adds a new function (which I imagine might also be useful to anyone making GAP into a library) to pass information about dead child processes to GAP.
Once this is merged in, I have another PR for IO, which will call this function. Then I can remove the line in the test which forces IO's signal handler to be disabled (because then the test will still work even with IO's signal handler).