Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libgap: save/restore recursion depth #4258

Merged
merged 1 commit into from
Feb 23, 2021

Conversation

embray
Copy link
Contributor

@embray embray commented Feb 17, 2021

Description

At other places in GAP where setjmp is called, notably GAP_TRY/CATCH the recursion depth counter also needs to be saved/restored.

It appears libgap needs to do this as well. If you have code like:

int ok = GAP_Enter();
if (ok) {
    ...
    GAP_CallFuncArray(...);
}
GAP_Leave();

if an error occurs in the function being called, the recursion depth counter will not get reset.

As a result, after enough errors in GAP functions, eventually the recursion depth counter reaches its limit (default 5000) and enters RecursionDepthTrap. Here something strange happens that I admit I don't fully understand, and could be something I'm doing wrong: Even though RecursionDepthTrap resets the recursion depth to zero before calling the error handler, at this point it actually goes into an infinite recursion and eventual stack overflow.

Here is some code in Sage and gappy that reproduced the problem:

for _ in range(5000):
    try:
        # It doesn't really matter what you put here as long as it will
        # result in an error
        gap.Sum(0, 1, 2)
    except GAPError:
        pass

I suspect you could reproduce the problem similarly with GAP.jl.

More broadly speaking, GAP_TRY/CATCH are new since I last looked closely at GAP error handling. I wonder if they couldn't be generalized a bit and used in the implementations of GAP_Enter() and GAP_Leave().

Text for release notes

Other fixed bugs

  • libgap: Fixed GAP_Enter() macro so that GAP's recursion depth counter is saved/restored. Without this, if too many GAP errors occurred during runtime a segmentation fault could occur in the program using libgap.

Checklist for pull request reviewers

  • proper formatting

If your code contains kernel C code, run clang-format on it; the
simplest way is to use git clang-format, e.g. like this (don't
forget to commit the resulting changes):

git clang-format $(git merge-base HEAD master)
  • usage of relevant labels

    1. either release notes: not needed or release notes: to be added
    2. at least one of the labels bug or enhancement or new feature
    3. for changes meant to be backported to stable-4.X add the backport-to-4.X label
    4. consider adding any of the labels build system, documentation, kernel, library, tests
  • runnable tests

  • lines changed in commits are sufficiently covered by the tests

  • adequate pull request title

  • well formulated text for release notes

  • relevant documentation updates

  • sensible comments in the code

@codecov
Copy link

codecov bot commented Feb 17, 2021

Codecov Report

Merging #4258 (dc85ec6) into master (13a00a4) will decrease coverage by 14.56%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           master    #4258       +/-   ##
===========================================
- Coverage   82.26%   67.69%   -14.57%     
===========================================
  Files         686      630       -56     
  Lines      296678   268108    -28570     
===========================================
- Hits       244048   181486    -62562     
- Misses      52630    86622    +33992     
Impacted Files Coverage Δ
src/libgap-api.c 3.60% <0.00%> (-60.40%) ⬇️
src/funcs.h 0.00% <0.00%> (-100.00%) ⬇️
src/rational.h 0.00% <0.00%> (-100.00%) ⬇️
src/exprs.h 7.14% <0.00%> (-86.61%) ⬇️
lib/ctblauto.gi 4.38% <0.00%> (-85.20%) ⬇️
lib/teachm2.g 15.38% <0.00%> (-84.62%) ⬇️
lib/helpt2t.gi 0.23% <0.00%> (-83.14%) ⬇️
lib/proto.gi 1.03% <0.00%> (-82.48%) ⬇️
src/compiler.c 8.04% <0.00%> (-80.91%) ⬇️
lib/ctbllatt.gi 0.44% <0.00%> (-80.36%) ⬇️
... and 411 more

@fingolfin
Copy link
Member

Thanks! GAP Days are ongoing so I can't review this immediately, but in the meantime I thought I should bring oscar-system/GAP.jl#597 to your attention as I think something similar might affect you as well.

@fingolfin fingolfin added backport-to-4.11 topic: kernel topic: libgap things related to libgap release notes: to be added PRs introducing changes that should be (but have not yet been) mentioned in the release notes labels Feb 17, 2021
Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ChrisJefferson
Copy link
Contributor

Does this have a problem if we end up in nested prejumps?

@fingolfin
Copy link
Member

What's a prejump?

@ChrisJefferson
Copy link
Contributor

Sorry, it's the name of the functions (prejump and postjump) the new variable is used in.

@embray
Copy link
Contributor Author

embray commented Feb 17, 2021

@ChrisJefferson It only saves the current recursion depth and restores it in the outer-most GAP_Enter/Leave. I think in general this is ok. Though maybe it would be possible to concoct a scenario where GAP_Enter is called with the recursion depth just one or two below the limit. Then RecursionDepthTrap is called, which resets the recursion depth to zero, calls the error handler, and then jumps back to the jump point set by GAP_Enter, thus restoring the old recursion depth. The result would be the RecursionDepthTrap still getting called repeatedly.

Maybe, related also to oscar-system/GAP.jl#597, rather than having a single static variable like the RecursionDepth one I added just for libgap, there should be a standard place in the GapState structure for "this is stuff that needs to be saved/restore before/after a long jump, alongside ReadJmpError itself". That could also include the RecursionDepth counter, rather than that being part of the funcs module state. I don't know if that makes any sense.

@embray
Copy link
Contributor Author

embray commented Feb 17, 2021

I figured out the other part of this problem, which was the infinite recursion that appeared after the initial call to RecursionDepthTrap. This was happening in gappy's error handling code which similarly to GAP.jl closes the ERROR_OUTPUT stream and replaces it with a text stream. Interestingly, the code in GAP.jl has the CloseStream call commented out but for seemingly unrelated reasons.

What happens is that my error handler calls CloseStream(ERROR_OUTPUT) but then the recursion depth error occurs while trying to replace it with a new stream. Upon getting into ErrorInner, if it tries to print to ERROR_OUTPUT, a closed stream, this results in an infinite recursion.

I think maybe there should be a separate fix to ErrorInner to prevent this: Check if the ERROR_OUTPUT stream is closed, and if so re-open it to *errout*.

embray added a commit to embray/gappy that referenced this pull request Feb 17, 2021
Kinda ugly workaround that unfortunately needs to reach for GAP
internals, though is effectively equivalent to the fix from
gap-system/gap#4258 so when this fix is
available in new versions of GAP we can enable the workaround just
for older versions so long as they're supported.

Also probably need to switch to using sig_GAP_Enter() and
sig_GAP_Leave() in more places, GapFunction.__call__ is just the
most common.  But it should probably be used in almost all cases...?
src/libgap-api.c Outdated Show resolved Hide resolved
@fingolfin
Copy link
Member

We should move some of the points raised here to separate issue(s) after this PR is merged, so they are not lost...

Maybe, related also to oscar-system/GAP.jl#597, rather than having a single static variable like the RecursionDepth one I added just for libgap, there should be a standard place in the GapState structure for "this is stuff that needs to be saved/restore before/after a long jump, alongside ReadJmpError itself". That could also include the RecursionDepth counter, rather than that being part of the funcs module state. I don't know if that makes any sense.

Absolutely. Getting to that has been a goal for me for quite some time; I've been working towards it by slowly whittling away at the kernel via refactorings. Found & fixed a lot of bugs in GAP itself this way, too. I don't claim to have a full picture of what really is needed in my, though; but I do think that GAP still has some potential bugs related to that, it's just that trying to either find a testcase that proves them to be bugs, or to analyze the code to refute them, is very tricky (and perhaps not worth the effort). So I'd just say we go ahead and try this, and simply are prepared to be agile and react to bugs as we encounter them.

So my idea would be to move the state that needs to be preserved into a little struct; an instances of this is a member of GapStat. Then your static variable also can be of this type, and we can memcpy between the two (or just use struct assignment). Candidates for things that should be preserved (plus places that currently do that):

  • RecursionDepth (see GAP_TRY and hence also Call1ArgsInNewReader, CALL_WITH_CATCH)
  • CurrLVars (see Call1ArgsInNewReader, CALL_WITH_CATCH)
  • UserHasQuit (see Call1ArgsInNewReader)
  • Tilde (see CALL_WITH_CATCH)

BTW, I now kinda regret that we don't pass the "payload" to the JumpToCatchCallback.

Another offside note: I used to have some experimental code for an alternative GAP unit testing setup, which relied on using CALL_WITH_CATCH / JUMP_TO_CATCH as a poor man's try/catch in GAP. This code of course doesn't work when using a JumpToCatchCallback. And the SCSCP package for GAP still uses it, so that package won't work fully in GAP.jl and gappy and sage (not saying this is a serious problem, it's simply that I realized this just now and thought I should record it)

@embray
Copy link
Contributor Author

embray commented Feb 18, 2021

BTW, I now kinda regret that we don't pass the "payload" to the JumpToCatchCallback.

What is the payload currently used for? It would be really useful for errors if it included the error message--this would reduce a lot of wrangling we have to do for error handling with libgap.

@fingolfin
Copy link
Member

payload is right now only used for some immediate integer values.

Having had a look at the code again, I think the best bet for better error integration is to simply patch ErrorInner / override it with a custom such function, tailored for the needs in the Julia/Python/... wrapper. Then we have access to all error options, the error message, and more; and can in principle even do without the JumpToCatchCallback callback... But let's perhaps discuss this elsewhere, e.g. on a new issue.

I'll merge this once the CI tests completed (I've rebased this to resolve a weird error the GitHub interface gave me that prevented me from merging)

@fingolfin fingolfin merged commit 54a2f86 into gap-system:master Feb 23, 2021
@embray embray deleted the libgap/recursion-depth branch February 26, 2021 10:49
@embray
Copy link
Contributor Author

embray commented Feb 26, 2021

I can't remember the details, but I seem to recall a long time ago trying to replace ErrorInner and running into trouble. But I agree that should be the easiest way to do it. I'm not sure we don't need JumpToCatchCallback though, unless you have some other idea in mind for how to run an arbitrary C function when a GAP error occurs.

@fingolfin
Copy link
Member

Backported to stable-4.11 in 23bc0af

ThomasBreuer added a commit to ThomasBreuer/gap that referenced this pull request Feb 26, 2021
Co-authored-by: Max Horn <max@quendi.de>
@olexandr-konovalov olexandr-konovalov removed the release notes: to be added PRs introducing changes that should be (but have not yet been) mentioned in the release notes label Mar 20, 2021
@olexandr-konovalov olexandr-konovalov added the release notes: added PRs introducing changes that have since been mentioned in the release notes label Mar 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-4.11-DONE release notes: added PRs introducing changes that have since been mentioned in the release notes topic: kernel topic: libgap things related to libgap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants