-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libgap: save/restore recursion depth #4258
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4258 +/- ##
===========================================
- Coverage 82.26% 67.69% -14.57%
===========================================
Files 686 630 -56
Lines 296678 268108 -28570
===========================================
- Hits 244048 181486 -62562
- Misses 52630 86622 +33992
|
Thanks! GAP Days are ongoing so I can't review this immediately, but in the meantime I thought I should bring oscar-system/GAP.jl#597 to your attention as I think something similar might affect you as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Does this have a problem if we end up in nested prejumps? |
What's a prejump? |
Sorry, it's the name of the functions (prejump and postjump) the new variable is used in. |
@ChrisJefferson It only saves the current recursion depth and restores it in the outer-most GAP_Enter/Leave. I think in general this is ok. Though maybe it would be possible to concoct a scenario where GAP_Enter is called with the recursion depth just one or two below the limit. Then RecursionDepthTrap is called, which resets the recursion depth to zero, calls the error handler, and then jumps back to the jump point set by GAP_Enter, thus restoring the old recursion depth. The result would be the RecursionDepthTrap still getting called repeatedly. Maybe, related also to oscar-system/GAP.jl#597, rather than having a single static variable like the |
I figured out the other part of this problem, which was the infinite recursion that appeared after the initial call to What happens is that my error handler calls I think maybe there should be a separate fix to |
Kinda ugly workaround that unfortunately needs to reach for GAP internals, though is effectively equivalent to the fix from gap-system/gap#4258 so when this fix is available in new versions of GAP we can enable the workaround just for older versions so long as they're supported. Also probably need to switch to using sig_GAP_Enter() and sig_GAP_Leave() in more places, GapFunction.__call__ is just the most common. But it should probably be used in almost all cases...?
We should move some of the points raised here to separate issue(s) after this PR is merged, so they are not lost...
Absolutely. Getting to that has been a goal for me for quite some time; I've been working towards it by slowly whittling away at the kernel via refactorings. Found & fixed a lot of bugs in GAP itself this way, too. I don't claim to have a full picture of what really is needed in my, though; but I do think that GAP still has some potential bugs related to that, it's just that trying to either find a testcase that proves them to be bugs, or to analyze the code to refute them, is very tricky (and perhaps not worth the effort). So I'd just say we go ahead and try this, and simply are prepared to be agile and react to bugs as we encounter them. So my idea would be to move the state that needs to be preserved into a little struct; an instances of this is a member of GapStat. Then your static variable also can be of this type, and we can memcpy between the two (or just use struct assignment). Candidates for things that should be preserved (plus places that currently do that):
BTW, I now kinda regret that we don't pass the "payload" to the Another offside note: I used to have some experimental code for an alternative GAP unit testing setup, which relied on using |
3d6cab0
to
dc85ec6
Compare
What is the payload currently used for? It would be really useful for errors if it included the error message--this would reduce a lot of wrangling we have to do for error handling with libgap. |
restored afterwards.
dc85ec6
to
037fb3e
Compare
Having had a look at the code again, I think the best bet for better error integration is to simply patch I'll merge this once the CI tests completed (I've rebased this to resolve a weird error the GitHub interface gave me that prevented me from merging) |
I can't remember the details, but I seem to recall a long time ago trying to replace ErrorInner and running into trouble. But I agree that should be the easiest way to do it. I'm not sure we don't need JumpToCatchCallback though, unless you have some other idea in mind for how to run an arbitrary C function when a GAP error occurs. |
Backported to stable-4.11 in 23bc0af |
Co-authored-by: Max Horn <max@quendi.de>
Description
At other places in GAP where
setjmp
is called, notably GAP_TRY/CATCH the recursion depth counter also needs to be saved/restored.It appears libgap needs to do this as well. If you have code like:
if an error occurs in the function being called, the recursion depth counter will not get reset.
As a result, after enough errors in GAP functions, eventually the recursion depth counter reaches its limit (default 5000) and enters RecursionDepthTrap. Here something strange happens that I admit I don't fully understand, and could be something I'm doing wrong: Even though
RecursionDepthTrap
resets the recursion depth to zero before calling the error handler, at this point it actually goes into an infinite recursion and eventual stack overflow.Here is some code in Sage and gappy that reproduced the problem:
I suspect you could reproduce the problem similarly with GAP.jl.
More broadly speaking,
GAP_TRY/CATCH
are new since I last looked closely at GAP error handling. I wonder if they couldn't be generalized a bit and used in the implementations ofGAP_Enter()
andGAP_Leave()
.Text for release notes
Other fixed bugs
GAP_Enter()
macro so that GAP's recursion depth counter is saved/restored. Without this, if too many GAP errors occurred during runtime a segmentation fault could occur in the program using libgap.Checklist for pull request reviewers
If your code contains kernel C code, run
clang-format
on it; thesimplest way is to use
git clang-format
, e.g. like this (don'tforget to commit the resulting changes):
usage of relevant labels
release notes: not needed
orrelease notes: to be added
bug
orenhancement
ornew feature
stable-4.X
add thebackport-to-4.X
labelbuild system
,documentation
,kernel
,library
,tests
runnable tests
lines changed in commits are sufficiently covered by the tests
adequate pull request title
well formulated text for release notes
relevant documentation updates
sensible comments in the code