Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the interaction of signals in GAP and the IO package #1851

Merged
merged 3 commits into from
Nov 11, 2017

Conversation

ChrisJefferson
Copy link
Contributor

This patch is part 1 of fixing the interaction of signals in GAP and the IO package.

This adds a new function (which I imagine might also be useful to anyone making GAP into a library) to pass information about dead child processes to GAP.

Once this is merged in, I have another PR for IO, which will call this function. Then I can remove the line in the test which forces IO's signal handler to be disabled (because then the test will still work even with IO's signal handler).

@ChrisJefferson
Copy link
Contributor Author

The IO part of this is at:

gap-packages/io#55 (just for reference).

/* Close down the child */
int status;
int retcode = close(PtyIOStreams[pty].ptyFD);
if (retcode)
Pr("Strange close return code %d\n", retcode, 0);
kill(PtyIOStreams[pty].childPID, SIGTERM);
// GAP (or another library) might wait on this PID before
// we handle it. If that happens, waitpid will return -1.
retcode = waitpid(PtyIOStreams[pty].childPID, &status, WNOHANG);
if (retcode == 0) {
// Give process a second to quit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this wasn't added in this PR (but the previous one which I didn't review), but the one second seems rather arbitrary (if usually enough for a process to quit). Does it make sense to have this waiting time configurable, or at least definable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit it's arbitary, but I'm not sure I'd want to add a configuration option people might not understand.

In general, if an app hasn't quite within a second of being sent a SIGTERM, then it's a fairly badly behaved program.

One option would be to wait less long before going back to see if the program has stopped, maybe waiting 0.1 seconds 10 times, so GAP can continue faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you bring it up, I'm tempted to make the timeout much longer (say 10 seconds), and check more often. That should deal with almost every case in practice, and make sure GAP keeps working for particularly badly behaved subproblems.

We would never have found this if there wasn't buggy copies of rev floating around which swallow all signals and enter an infinite loop.

@codecov
Copy link

codecov bot commented Nov 6, 2017

Codecov Report

Merging #1851 into master will increase coverage by 2.29%.
The diff coverage is 7.69%.

@@            Coverage Diff             @@
##           master    #1851      +/-   ##
==========================================
+ Coverage   63.63%   65.93%   +2.29%     
==========================================
  Files         946      513     -433     
  Lines      283525   268703   -14822     
  Branches    12686    11558    -1128     
==========================================
- Hits       180426   177166    -3260     
+ Misses     100313    88819   -11494     
+ Partials     2786     2718      -68
Impacted Files Coverage Δ
src/iostream.c 49.05% <7.69%> (+2.04%) ⬆️
src/hpc/tls.h 66.66% <0%> (-33.34%) ⬇️
hpcgap/lib/system.g 57.07% <0%> (-8.97%) ⬇️
src/hpc/aobjects.h 62.5% <0%> (-6.74%) ⬇️
src/hpc/misc.c 23.07% <0%> (-5.96%) ⬇️
src/intobj.h 80.39% <0%> (-2.22%) ⬇️
hpcgap/src/c_oper1.c 84.7% <0%> (-1.77%) ⬇️
src/hpc/traverse.c 76.57% <0%> (-1.72%) ⬇️
src/sysfiles.c 30.45% <0%> (-0.94%) ⬇️
src/saveload.c 56.1% <0%> (-0.61%) ⬇️
... and 501 more


# If IO is loaded, disable it's signal handler
gap> if IsBoundGlobal("IO_RestoreSIGCHLDHandler") then
> ValueGlobal("IO_RestoreSIGCHLDHandler")();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that's just for the current io version, until gap-packages/io#55 is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you comment this out, the test fails (it "should" fail, the manual explictly says that GAP's process stuff can't be used if IO's signal handler is enabled, it just got more likely to trigger).

@ChrisJefferson
Copy link
Contributor Author

Hopefully all issues fixed.

@markuspf markuspf added this to the GAP 4.9.0 milestone Nov 10, 2017
Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third commit message starts with "xAdd function" -> remove x, and also fix that comment typo?

Otherwise looks good (and once it's merge, I'll look into update io)

gap> scriptdir := DirectoriesLibrary( "tst/teststandard/processes/" );;
gap> checkpl := Filename(scriptdir, "check.pl");;

# If IO is loaded, disable it's signal handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it's" -> "its"

This patch ensures that if we catch the child dying in the signal
handler, it is correctly marked as finished.

This change makes it easier to keep track of child processes,
rather than ignoring signals we are not expecting.
@olexandr-konovalov olexandr-konovalov changed the title Signals Fixing the interaction of signals in GAP and the IO package. Nov 10, 2017
@fingolfin fingolfin merged commit 51f74f1 into gap-system:master Nov 11, 2017
@olexandr-konovalov olexandr-konovalov changed the title Fixing the interaction of signals in GAP and the IO package. Fix the interaction of signals in GAP and the IO package Jan 22, 2018
@olexandr-konovalov olexandr-konovalov added the release notes: added PRs introducing changes that have since been mentioned in the release notes label Jan 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: added PRs introducing changes that have since been mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants