Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features needed for full support of hierarchical modules #862

Closed
geimer opened this issue Feb 20, 2014 · 9 comments
Closed

Features needed for full support of hierarchical modules #862

geimer opened this issue Feb 20, 2014 · 9 comments

Comments

@geimer
Copy link
Contributor

geimer commented Feb 20, 2014

During the 5th hackathon at FZJ, we almost managed to get hierarchical modules working. The procedure is basically a three-step process as follows:

import os

from easybuild.tools.module_naming_scheme import ModuleNamingScheme

class HierarchicalModuleNamingScheme(ModuleNamingScheme):
    """Class implementing a custom hierarchical module naming scheme."""

    def det_full_module_name(self, ec):
        """
        Determine full module name from given easyconfig, according to a custom hierarchical module naming scheme.
        """

        # fetch required values
        name = ec['name']
        version = ec['version']
        version_suffix= ec['versionsuffix']
        tc_name = ec['toolchain']['name']
        tc_version = ec['toolchain']['version']

        modulename = os.path.join(name, version + version_suffix)

        # Toolchains should be in a separate directory
        if ((name == 'gompi') or (name == 'goolf')):
            return os.path.join('Toolchain', modulename)

        # Mapping toolchain name -> modulefile directories
        #    Compilers -> Base
        #    MPIs -> Compiler/<compiler_name>/<compiler_vers>
        #    the rest -> MPI/<toolchain_name>/<toolchain_vers>
        prefix_map = {'dummy': 'Base',
                      'GCC': os.path.join('Compiler', tc_name, tc_version),
                      'gompi': os.path.join('MPI', tc_name, tc_version),
                      'goolf': os.path.join('MPI', 'gompi', tc_version)}   # Map "higher level" toolchain to base toolchains

        return os.path.join(prefix_map[tc_name], modulename)

It would be nice to have the MPI module name and version as well as the module class available for this step (see #687).

  • Enhance the compiler and MPI modulefiles to add the corresponding modulefile directories to MODULEPATH. This can be done by manually adding a line like shown below to the easyconfig file:
modextrapaths = {'MODULEPATH': ['../../../../modules/all/Compiler/%(name)s/%(version)s']}

This is the example for the compiler easyconfig file. The MPI easyconfig file actually needs an additional '../..', as the build directory it is specific to the compiler and its version.

The '../' stuff is currently needed to go from the software installation directory to the top-level prefix. This definitely needs work for the case where the 'software' and 'modules' directories are not stored under the same prefix.
This step should be automated, and the paths cleaned up. In fact, the path depends on the custom module naming scheme defined above.

  • Since generated modulefiles include 'module load' commands using the full custom module name, the modulefiles currently have to be manually postprocessed (after each installation!):
find <easybuild_prefix>/modules/ -type f \
    | while read file; do \
        sed -i -e 's@ Base/@ @' \
            -e 's@ Compiler/\([^/]*/\)\{2\}@ @' \
            -e 's@ Toolchain/@ @' \
            -e 's@ MPI/\([^/]*/\)\{2\}@ @' $file;
done

This removes the modulefile directory prefixes from all 'module load' or 'is-loaded' commands in the modulefiles. This is needed to make loading of dependency modules work.

NOTE: The C/TCL versions of modules on the web are somewhat broken to make 'module switch' and 'module purge' work (I have a patched version of TCL modules available that works, which I'm happy to share). Lmod should be fine, though.

@geimer
Copy link
Contributor Author

geimer commented Feb 21, 2014

The "meta"-easyconfig file below can be used as a test case. However, if you try to use it to build the whole dependency chain automatically, it will (most likely) fail as the generated modules need to be postprocessed (see above).

easyblock = "Toolchain"

name = 'foo'
version = '1.0'

homepage = '(none)'
description = """Hello world!"""

toolchain = {'name': 'goolf', 'version': '1.5.12-no-OFED'}

dependencies = [
    ('QuantumESPRESSO', '5.0.2', '-hybrid'),
    ('GCC', '4.8.1', '', True),
]

moduleclass = 'toolchain'

@fgeorgatos
Copy link
Collaborator

Also, it proved to be very tedious to keep the dep resolution working correctly (with -no-OFED!),
due to reliance on exact filenames; error messages did not help, so that's an area to improve.

fyi. furthermore not prereq to close this issue, yet nice to have along with it, is #863

@boegel
Copy link
Member

boegel commented Feb 22, 2014

Excellent job on figuring this out and documenting the effort, thanks!

We'll try and look into fixing the issues you ran into soon, I'll keep you posted on progress.

@boegel
Copy link
Member

boegel commented Feb 24, 2014

@geimer, @fgeorgatos: #110 has some comments from @rtmclay (the Lmod developer) on module naming.
With the hierarchical module naming scheme you used, you didn't take into account possible dependencies on e.g. OpenBLAS or FFTW. So maybe the modules for the actual software packages should still include the toolchain name, e.g. QuantumESPRESSO/5.0.2-goolf-1.4.10 rather than QuantumESPRESSO/5.0.2, to distinguish between modules built with a different BLAS/LAPACK lib but with the same compiler/MPI...

I'm not sure how to avoid this to obtain 'clean' user-oriented module names like QuantumESPRESSO-5.0.2.
One thing I had in mind was to add one other level next to compiler and MPI, i.e. toolchain:

$ module load GCC/4.7.2
$ module load OpenMPI/1.6.4
# show toolchain that feature the selected compiler/MPI modules
$ module load goolf/1.4.10
# show software built with goolf/1.4.10
$ module av
QuantumESPRESSO/5.0.2

But, then the module swap on compiler or MPI would be broken, since goolf/1.4.10 would not be available for other compilers or MPI libraries (changing either of those implies changing the toolchain too).
Maybe we can think of something similar that wouldn't break swap, and still retain clean module names?

@geimer
Copy link
Contributor Author

geimer commented Feb 24, 2014

The implemented module naming scheme was just a demonstrator and definitely needs to be improved, agreed. The fundamental issue here is that the current notion of a toolchain includes "everything". In the hierarchical scheme, however, one would need to strip off the compiler and MPI as they are handled by the hierarchy already. That is, a three level scheme like the following could work:

$ module load GCC/4.7.2
$ module load OpenMPI/1.6.4
# show "math pseudo-packages" that feature the selected compiler/MPI modules, e.g.
# "olf" including OpenBLAS, ScaLAPACK and FFTW
$ module load olf/1.4.10
# show software built with GCC, OpenMPI, OpenBLAS, ScaLAPACK and FFTW
$ module av
QuantumESPRESSO/5.0.2

Now, if the "olf" package also exists for other compiler/MPI combinations, switching the compiler or MPI would just work fine. In fact, such a setup was discussed by @fgeorgatos and myself while investigating this. However, there are still some open issues for which we don't have good answers yet:

  • If a user decides to load the OpenBLAS, ScaLAPACK and FFTW modules individually, how do you automatically load the (correct) "olf" module to make the next level of the hierarchy visible?
  • If a user does module swap OpenBLAS ATLAS, for example, how do you automatically unload "olf" and load (the correct) "alf"?

This kind of magic could potentially be implemented, but AFAICS would need some (more or less unmaintainable) logic inside the modulefiles requiring knowledge about the supported math pseudo-packages and their inter-dependencies. I doubt that anyone really wants this...

@fgeorgatos
Copy link
Collaborator

Hi,

first, things first, here is an extract from an internal note about hierarchical modules namespace,
which I wrote to uni.lu colleagues who await the upcoming Lmod-friendly feature eagerly:

There are two ways we considered on how to implement it:
* per compiler (and then per mpi stack)
* per toolchain (and perhaps per HPCBIOS buildset right before it, preloaded)
Which case is best is site-specific and there are good arguments for either.

Namely, we had a lengthy discussion with Markus evaluating the different approaches and the conclusion -at least from my side!- was that no site has it automatically better and it is more a matter of operational organization and user support's style what is really best:

  • 2-level organization with compiler & MPI is the common approach; esp. @fzj and perhaps few more large sites, whereby the whole combinatorial space of these 2 layers is provided (!)... or,
  • toolchain-based organization, which is the natural first pick for current EasyBuilders; you can make this appear as a genuine 2-level scheme, if you provide buildsets, in the following manner:
    • buildset <-> compiler ## think of eb with its monthly parameterization as a compiler
    • toolchain <-> MPI ## complexity of toolchains & MPI stacks share common aspects

Both approaches mentioned above are compatible with module swap and provide nice features!

With the hierarchical module naming scheme you used, you didn't take into account possible dependencies on e.g. OpenBLAS or FFTW.

Somehow, there is an assumption by most big HPC sites that the compiler & MPI stacks are the only major concern and that's it. Of course, there are limits and caveats with this approach.

Indeed, schemes currently used by most HPC sites, do not strive for total uniqueness and, even UNITE itself which is fairly well designed, leaves the goolf vs goalf differentiation uncovered:
http://hpcbios.readthedocs.org/en/latest/HPCBIOS_2012-91.html

IMHO, nobody wants to have or maintain the 3rd layer of olf vs alf: it could be a nightmare
as regards the user support aspects. Yet, going 2-level, may defeat clarity/reproducibility.

If a user decides to load the OpenBLAS, ScaLAPACK and FFTW modules individually, how do you automatically load the (correct) "olf" module to make the next level of the hierarchy visible?

This brings us to the "matrix" concept that @rtmclay has been considering for a while;
to annoy you a bit with the complexity of this, consider the following: ictce == goolf + fftw2
And even that is not exactly true perhaps, given that APIs are rather not 100% compatible!

@boegel, @geimer:
my preference here is, that we take the unix approach of entrusting the user and allowing to shoot himself in the foot: sites have probably already formed established practices about what is their "better" approach, therefor we should stray away from preaching them about "the best" way.

With time (and when people will start routinely handling 10^5 builds) it will become obvious to the many, which approaches work optimally, when and how…

F.

@fgeorgatos
Copy link
Collaborator

ie. I am proposing to go for the 2 implementations mentioned above who may be the most popular:

  • compiler-mpi builds (and forget about olf vs alf debate for a while, until Lmod implements the matrix proposition)
  • toolchain and/or buildset-toolchain builds (ie. classic EasyBuild runs, as current userbase practices)

@boegel
Copy link
Member

boegel commented Feb 27, 2014

@fgeorgatos: short response to this: we will (need to) figure out a way to support both approaches, and not tie down EB users to one or another... I think they key part here is the custom module naming schemes. I feel we just need a way of communicating to EasyBuild whether modules are organized in a flat vs hierarchical way, and I don't think how the hierarchy is constructed exactly matters too much (whether its 2-level or deeper).
This is all preliminary though, until we actually dive into implementing full support for the hierarchical scheme like @geimer was working on (hopefully in the next couple of weeks).

@boegel
Copy link
Member

boegel commented Jul 30, 2014

This can be closed now #879 is merged in, which is included in EasyBuild v1.14.0.

@geimer: If you feel something is still missing in order to close this, do reopen it.

@boegel boegel closed this as completed Jul 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants