Skip to content

Unifying GAP4 and HPC GAP

Max Horn edited this page Jan 21, 2016 · 3 revisions

To see the concrete plans we derived from this document, please visit HPC GAP Plan for migration.

Unifying GAP4 and HPC-GAP

We had a meeting on November 27, 2015 in Aachen. Present were

  • Alice Niemeyer
  • Claus Fieker
  • Frank Lübeck
  • Jan De Beule
  • Max Horn
  • Mohamed Barakat

We discussed perceived and actual difficulties caused by having two GAP branches (master and hpcgap-default), both of which are currently under active development.

In the end, we arrived at the following plan.

Plan

We propose to incrementally reduce the differences between the two major GAP branches (master and hpcgap-default), thus slowly working towards unifying them, with the ultimate possibility of merging them into a single codebase, from which two versions of GAP can be built:

  1. One version virtually identical to the current GAP 4.x (from here on referred to as GAP4 mode), and
  2. one with HPC features (from here on referred to as HPC-GAP mode).

Whether and when the final merge happens depends on various factors, described below.

Note that the merge (and the alignment process before it) is not a goal by itself, but rather, it is driven by a desire to achieve certain overarching goal. These goals should always be kept in mind while working towards the merge.

Goals

  1. Ensure that existing and future GAP4 code can be used in the foreseeable future, without being forced to make changes to it (so that people can keep using their existing code)
  2. Make it easier to apply changes and improvements to GAP which are relevant to both versions, GAP4 and HPC-GAP.
  3. Be able to make large changes, such as
    • whitespace and indention cleanup
    • renaming and splitting source files
    • modularizing the library
    • improving the build system
    • ...

How to achieve this

As mentioned, the plan is to incrementally reduce and remove differences between the branches in a sensible fashion.

We collected some rules that should be followed in this process. Some concrete consequences of these are described later in this document.

  1. Until further notice, all changes should be done in a way such that GAP4 mode is not broken, and remains the default. Thus goal 2 ensured. In particular:

    1. GAP4 emergency fixes may be committed immediately, even if they are (initially) incompatible with HPC-GAP.

    2. Conversely, HPC-GAP fixes must not break GAP4 mode; if they do, they need to be fixed or reverted ASAP.

    3. To allow people who are not familiar with HPC-GAP to work on GAP4, it is also OK to commit GAP4 specific changes without implementing the corresponding HPC-GAP functionality. We do, however, assume good faith on all sides, and hence ideally, all commits will be checked to at least pass the test suits in both GAP4 and HPC-GAP mode, and if there are issues in HPC-GAP mode, the full GAP team is given a chance to look at these and fix them before they are committed permanently. The easiest way to achieve that is to submit all non-emergency changes via pull requests, which allows us to automatically perform tests on them, and to review them. That said, the development of GAP4 should not be held back unduly by HPC-GAP concerns.

  2. Commits should follow established best practice.

    • For example, each commit should focus on a single (kind of) change, explained with clarity in its commit message. So for example, do not mix large amounts of whitespace cleanups with functional changes; nor should you change functionality in multiple unrelated parts of the codebase.
    • This applies in particular to all commits which work towards unifying the two branches.
  3. HPC-GAP specific code should be isolated as much as possible. This makes it easier to understand the code even if one is not an expert in HPC-GAP. If done right, it should also increase the overall code quality of the HPC-GAP specific code.

  4. Some of the changes needed for HPC mode will change the behavior even in GAP4 mode, e.g. introducing uses of MakeImmutable. Hence they should be treated with utmost care, and discussed, as to avoid impacting GAP4 users negatively (and if we get bug reports caused by this, we need to deal with them immediately). So, whenever HPC-GAP specific changes make stuff immutable, please reason carefully about whether anybody may legitimately need to modify that object.

Rules for the kernel

  • For the kernel C code, use #ifdef HPCGAP to clearly mark HPC-GAP specific parts, so that with a compiler option a "pure" GAP4 version can be compiled (this would in fact be the default).

  • Certain HPC-GAP functions may also be #defined as empty in GAP4 mode, so that they can be used without an #ifdef HPCGAP. However, then they should be clearly recognizable as HPC-GAP specific, e.g. by using HPC as a prefix.

  • HPC-GAP specific source and header files should be moved to a new src/hpc subdirectory.

Location of HPC-GAP specific code in the kernel

Most thread-related functionality is isolated in separate files, namely:

  • aobjects.{c,h} (atomic lists, records, etc.)
  • atomic.h (low-level concurrency primitives)
  • backtrace.c (optional, to trace crashes during automated tests)
  • ffdata.{c,h} (required to avoid race conditions in finite field impl.)
  • fibhash.h (used internally by various hash tables)
  • gapmpi.{c,h} (for MPI only)
  • objset.{c,h} (object-identity based hash sets/tables)
  • serialize.{c,h} (fast serialization of objects)
  • traverse.{c,h} (traverse data structures for copying, etc.)
  • thread.{c,h} (basic threading machinery)
  • threadapi.{c,h} (GAP-level functionality for threads)
  • tls.h, tlsconfig.h (thread-local storage)
  • systhread.h (placeholder file for abstracting thread primitives)
  • zmqgap.{c,h} (ZeroMQ interface)

Some existing kernel files have seen non-trivial rework, namely:

  • gvars.{c,h} (different implementation for global variables)
  • plist.c (to make retyping of lists threadsafe)
  • scanner.c (allows us to redirect stdin/stdout; may be useful for sequential GAP also)
  • gasman.{c,h} (different garbage collector and memory layout; already mostly #ifdefed)
  • weakptr.c (to accommodate the alternative GC; also already mostly #ifdefed)
  • objects.h (minimal changes, but they do change several TNUMs)

Most other changes should be relatively small, can be left in sequential GAP without adverse effects, or may be a good idea to have in sequential GAP also.

Rules for the library

  • HPC-GAP specific library files should be moved to a new lib/hpc subdirectory.

  • HPC-GAP specific extensions should be documented even in GAP4 mode, at least if they are used in the library, to enable people to understand the library code. For example, the atomic keyword should have an entry in the GAP4 manual, explaining that it does nothing in GAP4 mode, and is there to simplify writing code that is compatible with HPC-GAP.

  • on the HPC-GAP side, document and explain why certain uses of atomic, MakeReadonly, etc. are there in the first place (this is different from documenting their functionality). This allows people to learn, and to understand existing code using them.

  • do not litter the library code with atomic keywords. Instead, try to identify larger design patterns, and address them in a way that limits HPC-GAP specifics to as little code as possible.

Example for avoiding atomic

Many instances of atomic in the library are there to deal with caches. For some examples, search for Z_MOD_NZ, FamiliesOfGeneralMappingsAndRanges or namesIndets.

The problem is that the HPC-GAP version of the cache access is much more complicated than the GAP4 version, and less intuitive. It is thus confusing to people who do not know about HPC-GAP, and also might easily get broken. Moreover, the HPC-GAP versions of the cache handlers sometimes are less efficient when used in single threaded mode than the GAP4 versions, which is undesirable.

To resolve this particular example, we could provide a few generic functions that handle caching. These then could have different implementations for HPC-GAP and GAP4. This has multiple advantages: It makes the code easier to understand, it reduces code duplication, and makes it easier to improve cache handling across the board in the future.

A hypothetical caching API might look like this (this is of course just a rough sketch, to get the discussion started; presumably, the very first step would be to identify as many caches in the GAP code base as possible, to allow us to design a truly useful API).

cache :=  NewCache( IsInt, IsObject );  # a cache mapping ints to objects
...
# get an object with the given key from the cache;
# if the cache does not contain a value for that key,
# it invokes func(key) to create one.
obj := GetObjectFromCacheOrCreate( cache, key, func );

In the Z_MOD_NZ example, the latter invocation might look like this:

obj := GetObjectFromCacheOrCreate( Z_MOD_NZ, p
  function(p)
    # Get the family of element objects of our ring.
    F:= FFEFamily( p );

    # Make the domain.
    F:= FieldOverItselfByGenerators( [ ZmodnZObj( F, 1 ) ] );
    SetIsPrimeField( F, true );
    SetIsWholeFamily( F, false );
    return F;
  end);

Note that such an API would also be useful for package authors.

The plan then is to design such an API, implement it in the HPC-GAP branch, and start using it extensively. Once it is somewhat stable, implement the same API in the GAP4 branch, and start using it there, too.

Once this is done, we can look at remaining cases of atomic, identify further patterns, and deal with them suitably. It is hoped that in the end none or only a small handful of them remain that cannot sensibly be replaced by higher level design patterns (e.g. because of performance issues, or because the code is simply to specific to generalize).

Aftermath

  • Once we have one branch, we can start with refactoring and cleaning up the code base without increasing the diffs between two major branches...

  • The first release from this new unified branch would probably be called 4.9 and by default compile GAP4 mode, not HPC-GAP mode. Users would have to explicitly enable HPC-GAP mode. The same for following release.

  • If HPC-GAP ever becomes stable, we may revisit this whole discussion, and decide how to proceed. E.g. we might then make HPC-GAP the default or split into two branches again, or ... As stated, it could and would be a new discussion.

Closing remark

The name and version "GAP 5.0" should NOT be used in association with HPC-GAP, until and if it is decided that it really should be a full and proper successor of GAP4 mode.

This is, however, completely open at this point. Indeed, it could also hypothetically turn out in a few years that HPC-GAP is a failure, and we might then excise it from the shared code base.

Clone this wiki locally