Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the interaction of signals in GAP and the IO package #1851

Merged
merged 3 commits into from
Nov 11, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 30 additions & 2 deletions src/iostream.c
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,31 @@ static UInt OpenPty(int * master, int * slave)
** Start a subprocess using ptys. Returns the stream number of the IOStream
** that is connected to the new processs
*/


// Clean up a signalled or exited child process
// CheckChildStatusChanged must be called by libraries which replace GAP's
// signal handler, or call 'waitpid'.
// The function should be passed a PID, and the return value of waitpid.
// Returns 1 if that PID was a child owned by GAP, or 0 otherwise.
int CheckChildStatusChanged(int childPID, int status)
{
GAP_ASSERT(childPID > 0);
GAP_ASSERT((WIFEXITED(status) || WIFSIGNALED(status)));
HashLock(PtyIOStreams);
for (UInt i = 0; i < MAX_PTYS; i++) {
if (PtyIOStreams[i].inuse && PtyIOStreams[i].childPID == childPID) {
PtyIOStreams[i].changed = 1;
PtyIOStreams[i].status = status;
PtyIOStreams[i].blocked = 0;
HashUnlock(PtyIOStreams);
return 1;
}
}
HashUnlock(PtyIOStreams);
return 0;
}

static void ChildStatusChanged(int whichsig)
{
UInt i;
Expand Down Expand Up @@ -593,14 +618,14 @@ Obj FuncCLOSE_PTY_IOSTREAM(Obj self, Obj stream)
{
UInt pty = HashLockStreamIfAvailable(stream);

PtyIOStreams[pty].inuse = 0;

/* Close down the child */
int status;
int retcode = close(PtyIOStreams[pty].ptyFD);
if (retcode)
Pr("Strange close return code %d\n", retcode, 0);
kill(PtyIOStreams[pty].childPID, SIGTERM);
// GAP (or another library) might wait on this PID before
// we handle it. If that happens, waitpid will return -1.
retcode = waitpid(PtyIOStreams[pty].childPID, &status, WNOHANG);
if (retcode == 0) {
// Give process a second to quit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this wasn't added in this PR (but the previous one which I didn't review), but the one second seems rather arbitrary (if usually enough for a process to quit). Does it make sense to have this waiting time configurable, or at least definable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit it's arbitary, but I'm not sure I'd want to add a configuration option people might not understand.

In general, if an app hasn't quite within a second of being sent a SIGTERM, then it's a fairly badly behaved program.

One option would be to wait less long before going back to see if the program has stopped, maybe waiting 0.1 seconds 10 times, so GAP can continue faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you bring it up, I'm tempted to make the timeout much longer (say 10 seconds), and check more often. That should deal with almost every case in practice, and make sure GAP keeps working for particularly badly behaved subproblems.

We would never have found this if there wasn't buggy copies of rev floating around which swallow all signals and enter an infinite loop.

Expand All @@ -612,6 +637,9 @@ Obj FuncCLOSE_PTY_IOSTREAM(Obj self, Obj stream)
kill(PtyIOStreams[pty].childPID, SIGKILL);
retcode = waitpid(PtyIOStreams[pty].childPID, &status, 0);
}

PtyIOStreams[pty].inuse = 0;

FreeStream(pty);
HashUnlock(PtyIOStreams);
return 0;
Expand Down
4 changes: 4 additions & 0 deletions src/iostream.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,11 @@
#ifndef GAP_IOSTREAM_H
#define GAP_IOSTREAM_H

// Provide a feature macro to let libraries check if GAP supports
// CheckChildStatusChanged.
#define GAP_HasCheckChildStatusChanged

int CheckChildStatusChanged(int childPID, int status);

/*F * * * * * * * * * * * * * initialize package * * * * * * * * * * * * * * *
*/
Expand Down
12 changes: 12 additions & 0 deletions tst/teststandard/processes/check.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/usr/bin/env perl
use strict;
use warnings;

use Time::HiRes;

if ( $ARGV[1] == 1 ) {
$SIG{INT} = 'IGNORE';
$SIG{TERM} = 'IGNORE';
}

Time::HiRes::sleep($ARGV[0]/1000)
18 changes: 18 additions & 0 deletions tst/teststandard/processes/children.tst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
gap> d := DirectoryCurrent();;
gap> scriptdir := DirectoriesLibrary( "tst/teststandard/processes/" );;
gap> checkpl := Filename(scriptdir, "check.pl");;

# If IO is loaded, disable its signal handler
gap> if IsBoundGlobal("IO_RestoreSIGCHLDHandler") then
> ValueGlobal("IO_RestoreSIGCHLDHandler")();
> fi;
gap> runChild := function(ms, ignoresignals)
> local signal;
> if ignoresignals then signal := "1"; else signal := "0"; fi;
> return InputOutputLocalProcess(d, checkpl, [ String(time), signal]);
> end;;
gap> for i in [1..200] do
> children := List([1..20], x -> runChild(Random([1..2000]), Random([false,true])));;
> if ForAny(children, x -> x=fail) then Print("Failed producing child\n"); fi;
> Perform(children, CloseStream);
> od;