Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate reducing the memory usage of the C module #433

Closed
wilfwilson opened this issue Mar 4, 2021 · 6 comments
Closed

Investigate reducing the memory usage of the C module #433

wilfwilson opened this issue Mar 4, 2021 · 6 comments
Assignees
Labels
C language A label for issues or pull requests relating to the kernel module of the package resolved-pending-release A label for issues that have been resolved and that can be closed when a new version is released.

Comments

@wilfwilson
Copy link
Collaborator

wilfwilson commented Mar 4, 2021

...particularly in the Schreier-Sims implementation.

@ChrisJefferson reported in semigroups/Semigroups#634 (comment) that when using the homomorphism finder for the first time, Digraphs allocates 1.25 GB of memory, and that moreover, this memory is kept around to be used for any subsequent runs. This is not really a problem in 64-bit, but it presents a potential problem in 32-bit mode.

This is my attempt at counting the big bits of memory that are assigned (thanks to Chris for helping):

  • CONDITIONS requires 132 MB.
  • STAB_GENS requires 258 MB
    • (space for a stabiliser chain that is MAXVERTS deep, with MAXVERTS generators on MAXVERTS points at each level.)
  • The array of strong generators requires 258 MB.
    • (something very similar to the above? I don't remember how these two things are related.)
  • The array of transversal elements requires 256 MB.
    • (space for MAXVERTS coset representatives of degree MAXVERTS at each of the MAXVERTS levels of the stabiliser chain.)
  • Their inverses also require 256 MB.

This is over 1.1 GB already.

Allocating over 770 MB of memory for a stabiliser chain of a group on at most 512 points is a bit extravagant, especially since, I gather, the groups we'll come across in practice will probably be quite small, and on a much smaller number of points. A stabiliser chain could be computed and stored with a lot less memory, although it's unclear what the performance tradeoff would be. (Hence: investigate!).

We should at least decide whether we're happy with this situation, especially for 32-bit system. Note that the memory requirement is roughly cubic in MAXVERTS, so even a small reduction in MAXVERTS on 32-bit systems would lead to a large decrease in memory use. See #432.

@wilfwilson wilfwilson added the C language A label for issues or pull requests relating to the kernel module of the package label Mar 4, 2021
@james-d-mitchell
Copy link
Member

I'm happy for this to be investigated, and support anything sensible that preserves the performance of the homomorphism finding code and possibly reduces the memory requirement. I'm not personally all that worried about allocating a GB of memory, but I can see why some might be worried, and that this might be one cause of the failing CI for the Semigroups package. The key reason we have things arranged like this is for performance, previously (before you wrote the Schreier-Sims in C) we used GAP stabiliser chains and the code spent 98% of its time using those GAP stabiliser chains. The intention was also to avoid repeatedly malloc and freeing memory for the homomorphisms code, so that we can compute many homomorphisms of relatively small graphs without the time penalty for malloc and free. The approach used is probably rather simple minded, I'd be happy to consider another approach!

@wilfwilson
Copy link
Collaborator Author

I'm happy for us to continue to prioritise time performance, and not make any tradeoffs in that regard. As you say, and as Chris reassured me, there is no need to worry about the amount that we're allocating, in 64-bit.

Yes, it's a concern in 32-bit mode... but do we care? Perhaps this is a good opportunity to reflect on the 32-bit support of Digraphs. Not to abandon it now, while everything works pretty well, but perhaps to decide how much (if any) effort we want to spend maintaining it; and to decide how and when we might drop this support. (Perhaps we should document that 64-bit is recommended).

I'll think about ways of reducing the Schreier-Sims memory usage without reducing performance. Obviously I'll need to be able to check that any changes don't affect performance: do you feel that the tests (standard/extreme) are extensive enough that their timings can be used to decide whether performance has been changed, one way the another?

@james-d-mitchell
Copy link
Member

I'll think about ways of reducing the Schreier-Sims memory usage without reducing performance. Obviously I'll need to be able to check that any changes don't affect performance: do you feel that the tests (standard/extreme) are extensive enough that their timings can be used to decide whether performance has been changed, one way the another?

Yes I think this should be noticeable in the tests, and I'm happy to see what you come up with. I think one major saving could be found by using an upper triangular array (if you see what I mean) rather than a square array to store orbits, transversals, etc in the Schreier-Sims implementation. This would complicate the arithmetic to access entries, but would cut the space required by approx. 50%. Unless I'm misremembering how this works....

@james-d-mitchell
Copy link
Member

Yes, it's a concern in 32-bit mode... but do we care?

Probably not much, but at the same time, we could fairly easily use 128 for MAXVERTS on a 32-bit system, and then we'd more or less resolve this, no? Again, maybe this will fail horribly, but there's no reason in principle that it should!

@wilfwilson
Copy link
Collaborator Author

I think one major saving could be found by using an upper triangular array (if you see what I mean) rather than a square array to store [..] transversals

Yes this is where I'd start, so the transversals would together be about 256 MB instead of 512 MB.

@james-d-mitchell
Copy link
Member

This will be resolved when #626 is merged!

@james-d-mitchell james-d-mitchell added resolved-pending-merge A label for issues that are resolved in a PR that is not yet merged. resolved-pending-release A label for issues that have been resolved and that can be closed when a new version is released. and removed resolved-pending-merge A label for issues that are resolved in a PR that is not yet merged. labels Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C language A label for issues or pull requests relating to the kernel module of the package resolved-pending-release A label for issues that have been resolved and that can be closed when a new version is released.
Projects
Development

No branches or pull requests

3 participants