Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SET_METHODS_OPERATION does not flush the methods cache. #2594

Closed
stevelinton opened this issue Jun 29, 2018 · 3 comments
Closed

SET_METHODS_OPERATION does not flush the methods cache. #2594

stevelinton opened this issue Jun 29, 2018 · 3 comments
Labels
kind: bug: unexpected error Issues describing bugs in which computation unexpectedly encounters an error, and PRs fixing them kind: bug Issues describing general bugs, and PRs fixing them topic: HPC-GAP Issues and PRs related to HPC-GAP topic: kernel

Comments

@stevelinton
Copy link
Contributor

stevelinton commented Jun 29, 2018

Observed behaviour

tests in #2521 fail in HPCGAP and I can't see how to rewrite the critical function to make them pass

Expected behaviour

There should be a way to flush the methods cache (of an operation and number of arguments)
when needed.

Copy and paste GAP banner (to tell us about your setup)

 *********   HPC-GAP 4.10dev-862-g3c8b617 of today
 *  GAP  *   https://www.gap-system.org
 *********   Architecture: x86_64-apple-darwin17.6.0-hpcgap64
             Maximum concurrent threads: 4
 Configuration:  gmp 6.0.0

#W <<< This is an alpha release.      >>>
#W <<< Do not use for important work. >>>

It seems (looking at INSTALL_METHOD_FLAGS in oper1.g for instance, that SET_METHODS_OPERATION(oper,n,MakeReadOnlySingleObj(meths));
(after copying the old methods list into meths and editing it, should be the correct way to modify a methods list in HPCGAP, replacing editing it in-place and calling CHANGED_METHODS_OPERATION(oper,n); but this doesn't work. Nothing in SET_METHODS_OPERATION flushes the method cache.

@stevelinton stevelinton added topic: HPC-GAP Issues and PRs related to HPC-GAP kind: bug: unexpected error Issues describing bugs in which computation unexpectedly encounters an error, and PRs fixing them topic: kernel labels Jun 29, 2018
@fingolfin
Copy link
Member

This is a bit tricky, because every thread has its own method cache -- so we can't just flush it in FuncSET_METHODS_OPERATION -- instead we somehow need to mark it as "dirty", so that all threads will flush that particular method cache the next time they access it.

The simplest way to do that I can think of is to reset CACHE_OPER(oper, n) to null -- but that introduces a leak in STATE(MethodCache), as the old slot for the pair oper,n will never be reclaimed. So I think this idea is out.

Next idea: each thread stores a copy of the bag ref to METHS_OPER(oper, n) in its cache; when accessing the cache, compare the two; if they don't match, we must flush the cache. Downside is that we need to add these checks. On the upside, this just means one read to the current struct OperBag, and we just read from that anyway to get the cache bag, so this should be an access to data in the L1 cache anyway.

@fingolfin
Copy link
Member

There is also a potential race issue in DoOperationNArgs: we first call GetMethodCached, then GetMethodUncached, then CacheMethod. But the list of methods resp. the corresponding cache may change in between any of them; it might also change as we iterate over all the methods!

We can fix all of these together as follows: First we store a copy of METHS_OPER(oper, n) in the first slot of the plist returned by CacheOper(oper, n). Then in DoOperationNArgs, as the very first thing, we call CacheOper(oper, n). From its return value we also extract the list of methods from there; this way, we know they'll stay consistent. (Perhaps'd we'd also compare against METHS_OPER(oper, n) here, and loop until they match? not yet sure)

Then we use this cache and method list throughout DoOperationNArgs, passing them to the three named subfunctions as needed. This way, we avoid any additional overhead (and even reduce the existing overhead a bit, I think). And we are sure to use a consistent list of methods.

The worst thing that can happen is that while DoOperationNArgs is running, the list of methods gets changed. In that case, we won't see that and finish with the old list of methods. But I'd argue that's better than the current behavior, were we could end up switching to the new list at an arbitrary point. I.e., if you install methods for an operation while running a computation which uses that operation, you'll experience pain no matter what, but at least it shouldn't be the crashy kind of pain.

@fingolfin
Copy link
Member

Implemented in PR #2604. Turns out it's even a bit easier than I thought, as the method cache is of course thread local, so reseting it from DoOperationNArgs is actually mostly trivial.

fingolfin added a commit to fingolfin/gap that referenced this issue Jul 3, 2018
stevelinton pushed a commit that referenced this issue Jul 23, 2018
@fingolfin fingolfin added the kind: bug Issues describing general bugs, and PRs fixing them label Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug: unexpected error Issues describing bugs in which computation unexpectedly encounters an error, and PRs fixing them kind: bug Issues describing general bugs, and PRs fixing them topic: HPC-GAP Issues and PRs related to HPC-GAP topic: kernel
Projects
None yet
Development

No branches or pull requests

2 participants