-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add documentation on heterogeneous clusters (WIP) #448
Open
migueldiascosta
wants to merge
4
commits into
easybuilders:develop
Choose a base branch
from
migueldiascosta:heterogeneous_clusters
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
603dc50
add initial documentation on heterogeneous clusters
migueldiascosta 83175b6
update documentation on heterogeneous clusters
migueldiascosta 63d225f
update documentation on heterogeneous clusters
migueldiascosta f356180
mention that OpenBLAS can also be built for multiple architectures
migueldiascosta File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
.. _heterogeneous_clusters: | ||
|
||
Heterogeneous clusters | ||
======================================= | ||
|
||
This page provides an overview on the different ways in which EasyBuild can be setup in heterogeneous clusters. | ||
|
||
Here, by "heterogeneous clusters" we mean clusters with nodes that support different instruction sets, either | ||
of the same family (e.g. Intel "broadwell", "skylake") or different ones (e.g. Intel "skylake", AMD "epyc"). | ||
There are other ways in which a cluster can be heterogeneous, e.g. different OS versions, and some of the options | ||
covered here can be applied to those, but they will not be covered explicitly. | ||
|
||
For some time now new instruction sets are how the most significant performance differences in new architectures | ||
are realized. The most common example is the width and operations of vectorization extensions (e.g. SSE, AVX), | ||
and so building software that takes advantage of those is crucial for HPC. | ||
|
||
.. contents:: | ||
:depth: 3 | ||
:backlinks: none | ||
|
||
.. _heterogenous_clusters_defaults: | ||
|
||
Default behaviour of EasyBuild in heterogeneous clusters | ||
-------------------------------------------------------- | ||
|
||
By default, EasyBuild optimizes builds for the CPU architecture of the build host, by instructing the compiler to | ||
generate instructions for the highest instruction set supported by the process architecture of the build host | ||
processor (cfr. :ref:`controlling_compiler_optimization_flags`). | ||
|
||
In an heterogenous cluster, this means that the software may not run in nodes that do not support the build host's | ||
instruction set (it would exit with an ``Illegal instruction`` error, in the case of software built with GNU toolchains, | ||
or ``Please verify that both the operating system and the processor support X, Y and Z instructions`` for software built | ||
with Intel toolchains), and that it will not be fully optimized when running in nodes that support higher instruction | ||
sets than those of the build host. The first problem can be solved by building with ``--optarch=GENERIC``, but it will | ||
make the second problem even worse. | ||
|
||
(With an Intel toolchain, the problem can be reduced by generating multiple code paths with the ``-ax`` compiler option | ||
in ``--optarch`` and by leveraging on MKL's automatic dispatch according to the execution node's instruction set, but no | ||
such option is available (yet) in the GNU toolchains) | ||
|
||
The solution is then to build multiple copies of each software, at least for those where performance is crucial, which is | ||
easily achieved simply by running EasyBuild from each type of node, the caveat being where exactly to install copies for | ||
different architectures in a way that they can be loaded, with their dependencies, and used effectively across the cluster | ||
by all users. | ||
|
||
.. _heterogeneous_clusters_visibility: | ||
Visibility of achitectures in heterogenous clusters | ||
--------------------------------------------------- | ||
|
||
One way of distinguishing between the many alternatives for using EasyBuild in an heterogeneous cluster concerns whether | ||
each host only sees the software compiled for its own architecture (plus any software eventually compiled for ``GENERIC``) | ||
or if it sees everything. | ||
|
||
By mounting architecture dependent targets on the same mountpoint in every host, the configuration is then very similar to | ||
what it would be in an homogeneous cluster, except that each (non-``GENERIC``) software still needs to be built for each | ||
architecture. | ||
|
||
This can be more robust, in the sense that from the point of view of each node, it looks like an homogeneous cluster. | ||
On the other hand, it is less flexible, as there are situations where it can be useful to load software built for another | ||
instruction set (usually, a subset). | ||
|
||
In order to maximize visibility and flexibility, all architectures can be visible, and the default architecture controlled | ||
with an architecture environment variable inserted into ``EASYBUILD_INSTALLPATH`` and ``MODULEPATH``, at least. | ||
|
||
.. _heterogenous_clusters_reducing_clutter: | ||
|
||
Reducing clutter in heterogeneous clusters | ||
------------------------------------------ | ||
|
||
In either of the two options above, multiple paths can be used to separate ``GENERIC`` software that only needs to be compiled | ||
once, or not at all (template libraries and software only available in binary form), from software that needs to be optimized | ||
for each architecture. | ||
|
||
One specific case is the one of the Hierarchical Module Naming Scheme (HMNS), since the packages in the ``Core`` level are good | ||
candidates for a single ``GENERIC`` build, but this needs to be done manually. Since most of the modules here are typically | ||
built as dependencies, this option implies separately building all ``Core`` software with ``--optarch=GENERIC`` once before | ||
building applications that depend on them. | ||
|
||
Alternatively, instead of using the hierarchy to decide what to build for a generic architecture, one could decide based on | ||
the toolchain, e.g. by associating ``--optarch=GENERIC`` with the ``GCCcore`` toolchain. | ||
|
||
... |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it only be changing the mointpoint? We thought about directories like /common-path/easybuild// where architecture is the instruction set name of gcc (nehalem, broadwell, skylake,...) to start with. On the nodes one would change the profile.d to prepend $MODULEPATH with the according modules path, hence module load would pick up the more optimized modules first and fall back to the other architectures, i.e. instruction sets for other modules. For example, MODULEPATH=/<....>/skylake/modules/all:/<....>/nehalem/modules/all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henkela this clearly needs to be rewritten, my goal was precisely to describe those two approaches (a node only seeing it's own arch, via mountpoints, or seeing everything but using only it's own arch, via variables).
Using variables is of course much more flexible, but in many simple cases using mountpoints may be more robust and foolproof (from the point of view of the nodes, and except for the custom mountpoint, it's the same as an homogeneous cluster)