-
-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libgap memory allocation on Cygwin #27214
Comments
comment:1
I think this is even sort of implicitly acknowledged in the GAP source code where it sets the default I found that as a short term workaround passing |
comment:2
Replying to @embray:
That makes me wonder... if it's good enough for Cygwin, would it be good enough to do that on all systems unconditionally? |
comment:3
I don't know. The default on 64-bit systems is It's also a question of what is meant by "good enough". Like I wrote, this likely does have impact when allocating a number of large GAP objects; or at least it ensures that there is enough memory reserved for GAP. But I don't really have a good way at hand to measure that. Might have to dig through the history to see what motivated this in the first place. For my purposes "good enough" is "it works, and most users will never tell the difference either way". |
comment:4
The current use of |
comment:5
Okay, I think I misunderstood exactly what In fact it only uses the manual Cygwin does have |
comment:6
Replying to @jdemeyer:
So to answer your question, no that would not be a good idea on any systems, really. |
comment:7
This issue is actually pretty bad I think. It doesn't immediately break anything, but it can become a real drag when doing anything that involves forking a lot of new processes, and in particular it can be bad when running the doctests in parallel. |
comment:8
What I do wonder is what changed that I'm only starting to notice this now. Before upgrading to GAP 4.10, we were still passing the same flags to Volker's libGAP, and I'm pretty sure it would have had the same behavior w.r.t. memory allocation. Maybe just since we've merged some other changes to make more use of the libGAP interface the problem has become more pronounced. I'd have to install an older version of Sage to compare. |
comment:9
I'm actually moving this up to blocker: I cannot, even with all other existing fixes applied, get the tests for I'm still a little unclear as to why this has gotten so bad only with the recent GAP updates, but at least there are a couple possible workarounds, which I'll try shortly... |
comment:10
I tried applying a patch for GAP (just on Cygwin) to use MAP_NORESERVE when mmap-ing the GAP pool. When I run GAP directly, or even when I run the GAP API tests (in the GAP sources, that is) it appears to work fine, at least superficially. However, when I use the libgap interface in Sage with this patch, it immediately segfaults:
which is completely bizarre and I can't explain it. I will dig in a bit to see what is happening here... |
comment:11
I see: When you create an mmap with MAP_NORESERVE on Cygwin, if you try to access the allocated region it creates without committing it results in an access violation--you can't read or write to that memory until it's been explicitly committed with an additional Normally Cygwin handles this transparently: when its vectored exception handler catches a This works fine normally, which is why I don't see any problem when running GAP directly. However, here's the tricky part: The reason this results in a SIGSEGV in Sage is because of Cysignals. Specifically, the custom "vectored continue handler" I added in cysignals#83 in order to better handle exceptions that occur on a sigaltstack. The reason I used a "vector continue handler" here is that, according to this invaluable analysis, a vectored continue handler can be run even when the kernel detects a failed SEH validation, which is what happens when you're suddenly running on a stack that Windows thinks is off in the weeds, as is the case with sigaltstack. The problem is that vectored continue handlers also run after a normal exception occurs and the SEH handler returns One aspect of this that is not totally clear to me is why this doesn't happen normally: That is, why is this behavior normally avoided in cysignals' signal handling? If I had to wager a guess, it's simply that when cysignals does a longjmp from within the signal handler, we never return to the original exception handler in Cygwin and the vectored continue handler isn't run. Whereas, in this one obscure case, we wind up in the I need to see if there's way to modify |
comment:13
One quite easy workaround: Just bail early out of This could still result in the undesired behavior if we happen to access some uninitialized memory in a |
Dependencies: #27384 |
Commit: |
Author: Erik Bray |
comment:15
This adds a patch to GAP, which works fine on plain GAP but for use in Sage also requires #27384. By design the patch only affects GAP on Cygwin. In principle the same change could be made on other systems that support It just means we can allocate a virtual address range for GAP's objects to live in up to any arbitrary size supported by the system, without regard to whether there is enough physical memory to use that entire address space (either at the time of allocation or sometime in the future) so it's still possible to run out of memory and get memory errors. The reason this is needed on Cygwin is, as explained above, due to strange notions in Windows as to how copy-on-write pages are supposed to work, particularly in the context of making a New commits:
|
Upstream: Not yet reported upstream; Will do shortly. |
comment:17
|
Changed upstream from Reported upstream. Developers acknowledge bug. to Fixed upstream, but not in a stable release. |
comment:25
After looking into this a bit more and obtaining some more stack traces, plus taking a look again at the In my example code in comment:23 the strange thing was that on the first Thus, the N processes created at the time of However, I'm correct, then Cygwin 3.0.0 should fix that. I'm trying that next. In the meantime a workaround is still needed. I would propose re-introducing the option to build the docs in serial, without using Another option we've talked about before is replacing |
comment:26
Yep, Cygwin 3.0 fixes it. So that's good news at least. I'll make sure Cygwin>=3.0 is used for new builds of Sage and for the buildbot (probably >=3.0.4 since 3.0.0 had some unrelated problems with fork()). It will still be good to have a workaround though. |
comment:27
Okay, I'm going to rebase this to work with #27070, and I'm adding a little patch I'm experimenting with currently to work around the docbuild issue on older Cygwins that have this bug. |
comment:29
I think I mentioned the Realistically, this ticket is not going to make it for Sage 8.7 though. |
comment:31
Replying to @vbraun:
What does "realistically" mean in this case? On what basis? I gave you plenty of advance notice that it's needed. |
comment:32
Did you forget to update the branch to #27070? |
comment:34
I thought I did, but apparently not? I must have pushed to a different remote. |
comment:35
Please add a reference to the upstream PR in the |
Reviewer: Jeroen Demeyer |
Changed branch from u/embray/cygwin/patch-gap-mmap-noreserve-pool to |
…regular-guess * u/dkrenn/sequences/rec-hash: (8211 commits) Updated SageMath version to 8.7 Updated SageMath version to 8.7.rc0 Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module. Trac sagemath#27490: Further fixes in use of os.wait() Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin Trac sagemath#27490: Address some review comments and other cleanup: A little bit of import cleanup Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin Fix alarm() test when cysignals was compiled with debugging Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount. Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs. cysignals should be a normal dependency Upgrade to Cysignals 1.10.2 Upgrade to notebook-5.7.6 Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences. Replacing None < infinity comparison with equivalent code. Some last little tidbits for uniformity. Removing some code duplication for __pth_root (changed to _pth_root_func). One more xderinv added. trac 27474: move some references to the master bibliography file. ...
…ular-warning * u/dkrenn/sequences/k-regular-guess: (8211 commits) Updated SageMath version to 8.7 Updated SageMath version to 8.7.rc0 Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module. Trac sagemath#27490: Further fixes in use of os.wait() Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin Trac sagemath#27490: Address some review comments and other cleanup: A little bit of import cleanup Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin Fix alarm() test when cysignals was compiled with debugging Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount. Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs. cysignals should be a normal dependency Upgrade to Cysignals 1.10.2 Upgrade to notebook-5.7.6 Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences. Replacing None < infinity comparison with equivalent code. Some last little tidbits for uniformity. Removing some code duplication for __pth_root (changed to _pth_root_func). One more xderinv added. trac 27474: move some references to the master bibliography file. ...
SageMath version 8.7, Release Date: 2019-03-23 * tag '8.7': (943 commits) Updated SageMath version to 8.7 Updated SageMath version to 8.7.rc0 Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module. Trac sagemath#27490: Further fixes in use of os.wait() Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin Trac sagemath#27490: Address some review comments and other cleanup: A little bit of import cleanup Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin Fix alarm() test when cysignals was compiled with debugging Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount. Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs. cysignals should be a normal dependency Upgrade to Cysignals 1.10.2 Upgrade to notebook-5.7.6 Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences. Replacing None < infinity comparison with equivalent code. Some last little tidbits for uniformity. Removing some code duplication for __pth_root (changed to _pth_root_func). One more xderinv added. trac 27474: move some references to the master bibliography file. ...
…ounded * u/dkrenn/k-regular-warning: (8211 commits) Updated SageMath version to 8.7 Updated SageMath version to 8.7.rc0 Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module. Trac sagemath#27490: Further fixes in use of os.wait() Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin Trac sagemath#27490: Address some review comments and other cleanup: A little bit of import cleanup Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin Fix alarm() test when cysignals was compiled with debugging Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount. Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs. cysignals should be a normal dependency Upgrade to Cysignals 1.10.2 Upgrade to notebook-5.7.6 Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences. Replacing None < infinity comparison with equivalent code. Some last little tidbits for uniformity. Removing some code duplication for __pth_root (changed to _pth_root_func). One more xderinv added. trac 27474: move some references to the master bibliography file. ...
As discussed in #27213, when we initialize libgap, we pass it the
-s
flag which according to the docs is 'set the initially mapped virtual memory', to a size determined bysage.interfaces.gap.get_gap_memory_pool_size()
, which on my system happens to be ~3GB.This is fine, in general, as it's an amount that's available to my system. Nonetheless, in most usage (especially e.g. in the test suite) most of this will not be used.
I have to look into exactly how this memory gets allocated, but however it's being allocated it is unfortunately "committed" memory, meaning that although those pages are allocated lazily, they are guaranteed by the system to be made available for that process, so regardless whether those pages are actually in RAM, space is still reserved for them. So perhaps it uses sbrk to expand the process's heap. When the process is forked, Cygwin's
fork()
has to copy the parent process's heap into the child. This has the unfortunate effect of accessing those pages, causing them to become allocated in physical memory (even, apparently, if they're clean / unused).This is essentially the same issue we faced with PARI in #22633, but applied to GAP. In fact, GAP's code sets the default value of the
-s
flag to 0 on Cygwin, presumably for reasons related to this. This might be possible to avoid, as was done in PARI, by instead allocating this memory withmmap
andMAP_NORESERVE
.Upstream PR: gap-system/gap#3335
Depends on #27070
Depends on #27490
Upstream: Fixed upstream, but not in a stable release.
Component: porting: Cygwin
Author: Erik Bray
Branch/Commit:
4b359de
Reviewer: Jeroen Demeyer
Issue created by migration from https://trac.sagemath.org/ticket/27214
The text was updated successfully, but these errors were encountered: