-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in testextra/grpauto.tst (memory usage) #2377
Comments
@Stefan-Kohl I can trace this to the following lines from #############################################################################
##
#F Stabilizer( <arg> ) . . . . . . . . . . use StabilizerOp for rcwa groups
##
MakeReadWriteGlobal( "Stabilizer" ); Unbind( Stabilizer ); # dirty hack ...
BindGlobal( "Stabilizer", #
function ( arg )
if Length( arg ) = 1 then return StabilizerOfExternalSet( arg[1] );
elif IsRcwaGroup(arg[1]) then return CallFuncList( StabilizerOp, arg );
else return CallFuncList( StabilizerFunc, arg ); fi;
return;
end ); Will you be able to replace this "dirty hack" as the comment says by some proper implementation urgently for GAP 4.9.1 release, please? |
Woah, not so fast. This is the original InstallGlobalFunction( Stabilizer, function( arg )
if Length( arg ) = 1 then
return StabilizerOfExternalSet( arg[ 1 ] );
else
return CallFuncList( StabilizerFunc, arg );
fi;
end ); So while the above hack is, well, a hack, I don't quite see how it would cause the problem at hand, given that we correctly branch into |
Yes, just came to the same conclusion looking at the same code of |
Looking now at some packages that appear in the results of profiling and are not loaded by default. That's what I've got so far:
|
I wonder if some package(s) add immediate functions for some attributes which then leads to both time and memory usage increases |
@james-d-mitchell I think I've told you earlier that I have reasons to suggest that semigroups is involved with this, but at that time I have no convincing arguments. It seems like I have them now - none of the packages that semigroups requires increases the memory usage as much as semigroups. On the other hand, there is no trading memory for performance - the runtime with semigroups package loaded is more than a minute longer than without semigroups. These are the files that show non-zero coverage:
|
@alex-konovalov Best,
|
@hulpke all timings and memory usage in the table above, as well as the break loop backtrace are reproducible with assertions turned off. |
@alex-konovalov |
@hulpke this is the case when groups are isomorphic. Maybe it takes longer time to find an isomorphism because of different random states/ heuristics / ordering of things etc. |
I've now tried a few runs to check impact of immediate methods. (Yes, I know that I am the curmudgeon who has always whined against the concept of immediate methods, so take the following with a grain of salt if you want so.) After an initial run to ensure memory has been allocated, I get for the Isomorphism calculation in the master branch (no assertions turned on) plain: 170s So there are two issues: Immediate methods in the main library, and ordinary methods in semigroups. While it is entirely possible that my code is badly written, Is that what we want immediate methods to do? Would it make sense to also have a test run with immediate methods turned off to spot such performance hits? (Caveat: I have a patch that speeds the whole calculation up far more, so one could have this issue go away. ) Testing with Some of these are old, some of them have been added just recently. Some likely could be replaces by While they are short, part of the cost is the iterated type change, as a few basic properties are deduced one-by-one. #I immediate: Size would set Size to infinity once IsFinite is set to false. Is this really so valuable? The semigroups package seems to be involved only with Should we make these immediate methods another issue? |
I think we should make it an issue. I knew immediate methods could slow things down, but I didn't guess it would be a 2x slowdown. |
Thanks - not sure I exactly understand what do you mean by "this issue" - #2377 or just the performance regression. Just in case to point out that to resolve #2377 one needs to fix running out of memory limit. |
@alex-konovalov |
I agree that we should look into (a) improving existing immediate methods, if poisble (i.e. make them faster, (b) reducing the number of them, (c) making the code which handles immediate methods faster. I have several ideas for that. E.g. for immediate methods on properties, in most (all) cases in the library, And yeah, we should measure performance regressions more closely (see also issues #260 and #636). In fact I know that @alex-konovalov used to do that, but it's unfortunate not quite easy to setup and maintain (and we cannot use Travis for that, as its runtimes fluctuate very heavily; we need to use dedicated hardware; which exists, in St Andres, but AFAIK the results are currently not visibly unless you can VPN to the St Andrews network... We discussed ways to change this in the past, but so far nothing happened (as usual, it's a matter of having the time to work on in it). I dream of having something like http://speed.pypy.org for GAP, but right now we are pretty far away from that. |
Some more notes: If I load I think in the tests, it is also instructive to see what happens for G:=PcGroupCode( 741231213963541373679312045151639276850536621925972119311,11664);;
IsomorphismGroups(G,PcGroupCode(CodePcGroup(G),Size(G)))=fail; time; |
@alex-konovalov you call this a regression, but I don't understand why. This test was not present in GAP 4.8, so it cannot have regressed "as a test". The only thing that regressed in this sense is the overall testsuite, in the sense that it now may not pass resp. take too long on some configurations. But the examples given here themselves do not seem to have regressed. To the contrary: In some quick tests I just run, performance seems to be far worse for them in GAP 4.8, worse than the worst I've seen in UPDATE: one of two computations completed, after 9818864 msec, or 2 hours and 43 minutes. The other (started with |
I'm slightly puzzled by your comment @fingolfin that "many more of its immediate methods beyond IsSemigroupIdeal are triggered in my tests". In The immediate method for
the many more other 4 are:
If it is somehow desirable to remove/improve these methods, then please file an issue on the Semigroups package issue tracker, and I'll be happy to resolve them. |
@james-d-mitchell My apologies, I misread a swarming output of Here is an example log:
I think a major culprit here is That said, there must be other factors at work here; as I said, we are still exploring. |
@fingolfin Thanks for clearing that up. Just to note further that |
I do not remember why I made the immediate method. I'll make a PR to Semigroups for removing it. |
@fingolfin I call it a regression because |
@alex-konovalov if that's how you define it, then moving this new test from |
@fingolfin agree without hesitations - although not to testextra (those tests are performed as part of |
P.S. @fingolfin, should the PR for removing this example from teststandard go into master or stable-4.9? |
Just to mention that the immediate method for |
This will allow to avoid test failure reported in gap-system#2377 in stable-4.9 branch. The fix in the master brancg will be done differenty. This test is duplicated in benchmarks, so we will monitor its status there.
This will allow to avoid test failure reported in gap-system#2377 in stable-4.9 branch. The fix in the master branch will be done differently. This test is duplicated in benchmarks, so we will monitor its status there.
This will allow to avoid test failure reported in #2377 in stable-4.9 branch. The fix in the master branch will be done differently. This test is duplicated in benchmarks, so we will monitor its status there.
This now happens only in the master branch. In stable-4.9 the test causing the problem has been removed in #2407. |
The following is observed in
teststandard
tests on Jenkins:In the master branch this appens on both macos and linux jenkins slaves. In the stable-4.9 branch - only on macos. I've tried to run just that example starting GAP with
-o 1g
option and so war was able to reproduce this problem only once. Apparently, there is some interaction between other tests and loaded packages, and balancing around the 1g limit with some randomness and garbage collections making this hard to reproduce.This is the input:
and this is the memory usage that I observe under macos:
(the last 1.01g value is measured after a break loop happened and I've entered "return" to resume the calculation).
The problem can be reproduced easier if GAP is started with just slightly smaller memory limit
-o 1000m
. Then afterError, reached the pre-set memory limit
I see this:The text was updated successfully, but these errors were encountered: