The PGMPITuneLib library is a tool that relies on self-consistent performance guidelines to automatically tune the performance of MPI libraries.
Performance guidelines require that specialized MPI collective functions are not slower than semantically equivalent implementations using less-specialized functions, which we call mock-up versions. For example, MPI_Allgather should provide a better latency than the (semantically equivalent) call to MPI_Gather followed by an MPI_Broadcast of the results.
PGMPITuneLib is designed to transparently replace the default implementation of an MPI collective function with one of its mock-up implementations, if the corresponding performance guideline is violated.
More details can be found in:
- Sascha Hunold and Alexandra Carpen-Amarie. 2018. Autotuning MPI Collectives using Performance Guidelines. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018). Association for Computing Machinery, New York, NY, USA, 64–74. DOI:
- an MPI library
- CMake
The code can be built as follows:
cmake ./
PGMPITuneLib provides two different libraries:
- PGMPITuneCLI enables the user to select a specific mock-up function implementation for each MPI collective, and can be used to benchmark the performance of MPI applications
- PGMPITuneD uses performance profiles to automatically tune applications by redirecting MPI calls to the mock-up implementation that achieved the best performance
The user code has to be linked against the PGMPITuneCLI library and then the selected mock-up is transparently used instead of the default implementation.
- replace calls to MPI_Allgather with a semantically equivalent function that uses MPI_Gather followed by an MPI_Bcast to obtain the same results
mpicc *.c -o mympicode -lpgmpitunecli -lmpi
mpirun -np 2 ./mympicode --module=allgather=alg:allgather_as_gather_bcast
If the command line arguments are interfering with the paramater parsing of the actual binary, you can pass the options to the library by an
environment variable called PGMI_PARAMS
export PGMPI_PARAMS="--module=allgather=alg:allgather_as_gather_bcast"
mpirun -np 2 ./mympicode
The user code has to be linked against the PGMPITuneD library.
To inform the library which MPI collectives should be replaced with mock-up implementations, the user needs to provide the path to a directory containing performance profiles as a command-line argument to the application call.
A performance profile records the MPI collective name, the number of
processes for which the tuning was performed, and a list of message
size ranges for which the function should be replaced with a different
algorithm. An example is provided in
# test profile
MPI_Allgather # collective name
4 # profile for p=4 procs
1 allgather_as_allreduce
2 allgather_as_alltoall
3 allgather_as_gather_bcast
4 # nb of (msg size range + alg id)
16 16 1
32 32 2
64 128 1
1024 2048 3
- use the provided test profile to tune
mpicc *.c -o mympicode -lpgmpituned -lmpi
mpirun -np 2 ./mympicode --ppath=${PGMPITUNELIB_PATH}/test/perfmodels/models1
Similarly, you can also pass this option by an environment varrable:
PGMPI_PARAMS="--ppath=${PGMPITUNELIB_PATH}/test/perfmodels/models1" mpirun -np 2 ./mympicode
Add the --config
command-line argument to specify the path to a
configuration file. The configuration file should contain a list of
key-value pairs, one per line, separated by a single space character.
Comment lines (starting with =#=) are also accepted.
The configuration file is useful to modify the default amount of memory that can be used in the implementation of mock-up functions. If a mock-up requires more memory than the limit imposed by the configuration file, the default MPI collective will be used instead.
# Size limit for the additional data buffers used by mock-up functions
size_msg_buffer_bytes 100000
# Size limit for the additional counts arrays used by mock-up functions
size_int_buffer_bytes 10000