The Performance Instrumentation Refinement Automation framework PIRA approaches the time-consuming task of generating a reasonable performance measurement for an unknown code base when using Score-P. For more information please see our papers:
[PI18] | Jan-Patrick Lehr, Alexander HĂĽck, Christian Bischof. PIRA: performance instrumentation refinement automation. In 5th ACM SIGPLAN International Workshop on Artificial Intelligence and Empirical Methods for Software Engineering and Parallel Computing Systems (AI-SEPS), pages 1-10, ACM, 2018. |
[PI19] | Jan-Patrick Lehr, Alexandru Calotoiu, Christian Bischof, Felix Wolf. Automatic Instrumentation Refinement for Empirical Performance Modeling. In International Workshop on Programming and Performance Visualization Tools (ProTools), pages 40-47, IEEE, 2019. |
[PI21] | Peter Arzt, Yannic Fischler, Jan-Patrick Lehr, Christian Bischof. Automatic Low-Overhead Load-Imbalance Detection in MPI Applications. In Euro-Par 2021: Parallel Processing. Lecture Notes in Computer Science, vol 12820, pages 19-34, Springer, 2021. |
PIRA runs the following four phases (2-4 are iterated):
- Build the target application and perform a baseline measurement.
- Generate an initial performance instrumentation, based on the statement aggregation selection strategy (cf. this paper).
- Run the instrumented target application to generate a profile in the cubex format.
- Analyze the generated profile to find a new and improved instrumentation.
PIRA supports both compile-time and run-time filtering of functions, including runtime filtering of MPI functions through automatically generated wrappers. In compile-time filtering, only the desired functions are instrumented at compile-time, reducing the overall measurement influence significantly. In contrast, in runtime filtering, the compiler inserts instrumentation hooks into every function of the target application, and the filtering happens at runtime.
PIRA requires CMake (>=3.16), Clang/LLVM 10, Python 3, Qt5 and OpenMPI 4. It (or MetaCG) will further download (and build)
- MetaCG
- Modified Score-P 6.0
- Extra-P (version 3.0)
- LLNL's wrap
- bear (version 2.4.2)
- cxxopts (version 2.2.1)
- nlohmann json (version 3.10.5)
- spdlog (version 1.8.2)
If you want to build PIRA in an environment without internet access, please see the resources/build_submodules.sh
script, and adjust it to your needs.
Clone the PIRA repository.
$> git clone https://github.com/tudasc/pira
$> cd pira
Second, build the dependent submodules using the script provided, or pass values for the different options (see usage info via -h
for options available).
All downloaded and built files will be placed into external/src/
while all installations will be placed into external/install/
.
Specify the number of compile processes to be spawned for the compilation of PIRA's externals.
The script will download PIRA's dependencies in their default version.
These versions are also tested in our CI and are expected to work.
$> cd resources
$> ./build_submodules.sh -p <ncores>
As final step, the build script will write the install paths of the submodules into the file resources/setup_paths.sh
.
Should you wish to rebuild PIRA, please remember to git restore
this file.
We also provide an (experimental) Dockerfile
to build PIRA and try it.
$> podman build -t pira:master -f docker/Dockerfile .
When running inside the container, e.g., the integration tests, please invoke the scripts as follows.
$> cd resources
$> . setup_paths.sh
$> cd ../test/integration/GameOfLife # Example test case
# By default PIRA will look into $HOME/.local, which is not currently existent in the docker
# XDG_DATA_HOME signals where to put the profiling data PIRA generates
$> XDG_DATA_HOME=/tmp ./run.sh
For a full example how to use PIRA, checkout the run.sh
scripts in the /test/integration/*
folder.
A potentially good starting point is the GameOfLife
folder and its test case.
First, set up the required paths by sourcing the script in the resources
folder.
$> cd resources/
$> . setup_paths.sh
Then, you can run an example application of PIRA on a very simple implementation of Conway's Game of Life by using the provided run.sh
script in the ./test/integration/GameOfLife
folder.
$> cd ./test/integration/GameOfLife
$> ./run.sh
The script performs all steps required from the start, i.e., preparing all components for a new target code, to finally invoke PIRA. In the subsequent sections, the steps are explained in more detail. The steps are:
- Construct a whole-program call graph that contains the required meta information. This needs to be done for a target application whenever the code base changed.
- Implement the PIRA configuration. For the example, the integration tests provide example configurations.
- Implement the required PIRA functors. For the example, simple example functors are provided, which may also work for other applications.
- Invoke PIRA with the respective configuration.
config
Path to the config-file (required argument)--config-version [1,2]
Version of PIRA configuration, 2 is the default and encouraged; 1 is deprecated.--runtime-filter
Use run-time filtering, the default is false. In compile-time filtering, the functions are not instrumented at compile-time, reducing the overall measurement influence significantly, but requiring the target to be rebuilt in each iteration. In contrast, with runtime filtering, the compiler inserts instrumentation hooks in every function of the target application.--iterations [number]
Number of Pira iterations, the default value is 3.--repetitions [number]
Number of measurement repetitions, the default value is 3.--tape
Location where a (somewhat extensive) logging tape should be written to.--extrap-dir
The base directory where the Extra-p folder structure is placed.--extrap-prefix
Extra-P prefix, should be a sequence of characters.--version
Prints the version number of the PIRA installation--analysis-parameters
Path to configuration file containing analysis parameters for PGIS. Required for both Extra-P and LIDe mode.--slurm-config [path to slurm cfg file]
Enables running the target code on a slurm cluster. A slurm config file is needed. Please see this section for more information.
--hybrid-filter-iters [number]
Re-compile after [number] iterations, use runtime filtering in between.--export
Attaches the generated Extra-P models and data set sizes into the target's IPCG file.--export-runtime-only
Requires--export
; Attaches only the median runtime value of all repetitions to the functions. Only available when not using Extra-P.--load-imbalance-detection [path to cfg file]
Enables and configures the load imbalance detection mode. Please read this section for more information.
PIRA uses source-code information for constructing an initial instrumentation and deciding which functions to add to an instrumentation during the iterative refinement.
It provides a Clang-based call-graph tool that collects all required information and outputs the information in a .json
file.
You can find the cgcollector
tool in the subdirectory ./extern/src/metacg/cgcollector
.
PIRA requires the callgraph file to be in the MetaCG file format in version 2 (MetaCG v2).
More information on the CGCollector and its components can be found in the MetaCG documentation.
Applying the CGCollector usually happens in two steps.
-
At first,
cgc
is invoked on every source file in the project. e.g.:for f in $(find ./src -type f \( -iname "*.c" -o -iname "*.cpp" \) ); do cgc --metacg-format-version=2 $f done
-
The
.ipcg
-files created in step 1 are then merged to a general file usingcgmerge
.- Create an output file, solely containing the string
"null"
2. If your project contains more than onemain
function, please only merge the file with the correctmain
function.
echo "null" > $IPCG_FILENAME find ./src -name "*.ipcg" -exec cgmerge $IPCG_FILENAME $IPCG_FILENAME {} +
- Create an output file, solely containing the string
The final graph needs to be placed into the directory of the callgraph-analyzer. Since PGIS is currently used for the CG analysis, the generated whole program file is copied into the PGIS directory.
Currently, it is important that the file in the PGIS directory is named following the pattern item_flavor.mcg
. An item stands for a target application. More on the terms flavor and item in the next section.
# Assuming $PIRA holds the top-level PIRA directory
$> cp my-app.mcg $PIRA/extern/install/pgis/bin/item_flavor.mcg
The PIRA configuration contains all the required information for PIRA to run the automatic process.
The various directories that need to be specified in the configuration-file can either be absolute paths, or paths, relative to the execution path of pira. Paths may contain environment variables, e.g., $HOME
.
The examples are taken from the GameOfLife example in ./test/integration/GameOfLife
.
The user specifies: the directory in which to look for subsequently defined items, in the example, the directory is ./gol/serial_non_template
.
These directories are given aliases, which are dereferenced using the '%' sign.
An item in PIRA is a target application, built in a specific way, which is the reason for it being grouped in the configuration under builds.
{
"builds": {
"%gol": {
"items": {
"gol": {
...
}
}
}
}
"directories": {
"gol": "./gol/serial_non_template"
}
}
Every item specifies which analyzer should be used.
The default analyzer ships with PIRA, and the sources can be found in ./extern/src/metacg/pgis
or the installation in ./extern/install/pgis/bin
, respectively.
The analyzer is responsible for steering the instrumentation refinement, and is, therefore, an essential part of the PIRA framework.
The argmap field specifies the different arguments that are passed to the target application when running the performance experiments. How the arguments are passed to the target application is defined by different mappers. In the example, a linear mapper is used, which simply iterates the values of the parameter named size in the order given in the list.
"argmap": {
"mapper": "Linear",
"size": [50, 80, 110, 150, 300, 500]
}
The cubes field is the location where PIRA should store the obtained Score-P profiles. It will construct a directory tree in that location, so the user can, after PIRA finished, also easily invoke the Extra-P modeling tool by simply passing it the respective location, i.e., /tmp/pira in the example.
"cubes": "/tmp/pira"
The flavors field adds another level of possible distinction, as target applications can be built in different flavors. An example would be to specify different math libraries that the target application should link against.
Finally, the functors directory points PIRA to the location where it looks for the user-provided Python functions that ultimately tells PIRA how to build, run, and analyze the target application. In the example, PIRA is pointed to a directory called functors relative to the location of the configuration.
"flavors": [
"ct"
],
"functors": "./functors",
"mode": "CT"
The mode field, in this version of PIRA, is ignored.
As of now, the user needs to implement five different functors:
analyze_<ITEM>_<FLAVOR>.py
: invokes the analyzer.clean_<ITEM>_<FLAVOR>.py
: cleans the build directory.<ITEM>_<FLAVOR>.py
: build the instrumented version.no_instr_<ITEM>_<FLAVOR>.py
: builds the vanilla version.runner_<ITEM>_<FLAVOR>.py
: runs the target application.
Functors, generally, support two modes of invocation: active and passive.
The functor tells PIRA which mode it uses by setting the respective value to True
in the dictionary returned by the function get_method()
.
In active mode, the functor itself invokes the required commands, for example, to build the software.
When invoked, the functor is passed a **kwargs
parameter holding, for example, the current directory and an instance of a subprocess shell.
The passive mode solely returns the commands to execute, e.g. the string make
to invoke a simple Makefile at the item's top-level directory.
It is also passed a kwargs
parameter that holds specific information, like pre-defined values needed to add to CXXFLAGS or additional linker flags.
An example of a passive functor may be found in the examples
and test
directories.
Currently, all implemented functors use the passive mode.
PIRA passes the following keyword arguments to all functors. In addition, different PIRA components may pass additional arguments.
Important: We now ship our own Score-P version. Thus, it is no longer required to adjust compile commands in PIRA.
Check out the functors in test/integration/AMG2013
for example usages of the different information.
Currently, no information is passed to all functors
- ANALYZER_DIR: The directory in which the analyzer i.e. PGIS, is searched for.
- filter-file: The path to the generated white list filter file (to be passed to Score-p).
- util: Reference to a PIRA Utility object.
- args: The arguments passed to the target application as a list, i.e.,
[0]
accesses the first argument,[1]
the second, and so on. - LD_PRELOAD: The path to the
.so
file implementing the MPI wrapper functions (crucial for MPI filtering).
Additional parameters are required for some analysis modes. Specifically, PIRA LIDe (see below) and Extra-P modeling analysis require user provided parameters. Create a JSON file and provide its path to PIRA using the --analysis-parameters
-switch. The following example contains parameters for the Extra-P modeling mode. The available strategies to aggregate multiple Extra-P models (when a function is called in different contexts) are: FirstModel
, Sum
, Average
, Maximum
.
{
"Modeling": {
"extrapolationThreshold": 2.1,
"statementThreshold": 200,
"modelAggregationStrategy": "Sum"
}
}
For more details about the load imbalance detection feature, please refer to [PI21].
Provide the PIRA invocation with a path to a configuration file using the --load-imbalance-detection
-parameter. This JSON-file is required to have the following structure:
{
"metricType": "ImbalancePercentage",
"imbalanceThreshold": 0.05,
"relevanceThreshold": 0.05,
"contextStrategy": "None",
"contextStepCount": 5,
"childRelevanceStrategy": "RelativeToMain",
"childConstantThreshold": 1,
"childFraction": 0.001
}
- metricType: Metric which is used to quantify load imbalance in function runtimes. If unsure, use Imbalance Percentage. (Other experimental options: Efficiency, VariationCoeff)
- imbalanceThreshold: Minimum metric value to be rated as imbalanced. For Imbalance Percentage, 5% can be used a starting value.
- relevanceThreshold: Minimum runtime fraction a function is required to surpass in order to be rated as relevant.
- contextStrategy: Optional context handling strategy to expand instrumentation with further functions. Use MajorPathsToMain for profile creation and FindSynchronizationPoints for tracing experiments. Use None to disable context handling. (Other experimental option: MajorParentSteps with its suboption contextStepCount)
- childRelevanceStrategy: Strategy to calculate statement threshold for the iterative descent. If unsure, use RelativeToMain which will calculate the threshold as max(childConstantThreshold, childFraction * (main's inclusive statement count))
To run PIRA on a cluster with the SLURM workload manager, invoke it with the --slurm-config
flag. Give the path to your batch system configuration file with it. See the integration tests suffixed with _Slurm
(test/integration/*_Slurm/
). PIRA currently supports batch systems with the SLURM workload manager. PIRA supports the use of a module
-system, which may be found on slurm clusters.
The batch system configuration file is a JSON-file, which is structured as follows:
{
"general": {
"backend": "slurm",
"interface": "pyslurm",
"timings": "subprocess"
},
"module-loads": [
{
"name": "gcc",
"version": "8.5"
},
{
"name": "llvm",
"version": "10.0.0",
"depends-on": [
{
"name": "gcc",
"version": "8.5"
}
]
}
],
"batch-settings": {
"time_str": "00:10:00",
"mem_per_cpu": 3800,
"number_of_tasks": 1,
"partition": null,
"reservation": null,
"account": "your_account_here",
"cpus_per_task": 96
}
}
Breakdown:
general
section: Lets you choose in which way PIRA will execute your code on the batch system. Since every option in this section is optional, you can omit the whole section, if you are fine with using the defaults:backend
: What workload manager to use. Choices:slurm
, for the slurm workload manager. Default:slurm
, therefor optional.interface
: In which way PIRA should interact with your batch system manager. For the SLURM backend, these are:pyslurm
, to use PySlurm (this requires PySlurm to be installed, see this section;sbatch-wait
, to use standardsbatch
with the--wait
flag;os
, for standardsbatch
andsqueue
interaction calls. Default:pyslurm
, therefor optional.timings
: How timing of the target code should be done. Choices:subprocess
, for timing with a python wrapper andos.times
in a subprocess (exactly like PIRA does it if ran local);os
, to use/usr/bin/time
. Default:subprocess
, therefor optional.force-sequential
: Defaultfalse
. Set totrue
to force PIRA/your batch system to do all runs in sequential order (only one execution of the target code at a time). This means PIRA will take care that your batch system runs repetitions, as well as different jobs in scaling experiments in sequential order. If set to/left onfalse
PIRA will try to instruct the batch system to do as many executions as possible within every iteration in parallel.
module-loads
section: Currently not in use in PIRA, work in progress! Currently, you need to load all modules manually, before starting PIRA! Meant for which modules should be loaded with themodule
system. This requires one to be in place (commandsmodule load
andmodule purge
may be used by PIRA). If you do not have amodule
system in place, or do not want to use it, either omit this section completely, or set"module-loads": null
. To specify modules PIRA should load, specify a list of modules, as in the example above.- Each module have to have a
name
. - The
version
is optional, if not given, it will depend on what themodule
system loads as default module version. It is recommended to always give versions explicitly. depends-on
is also optional. Give a list of modules on which this module depends. These modules have to have aname
and can optional have aversion
given. The dependencies defined here are used by PIRA to determine the order in which modules should be loaded- If you do not give a version, it will only check that some version of this module was loaded. It is recommended to always give versions explicitly.
- If you do not specify this, or set it to
null
, it is assumed, this module has no dependencies. - Be aware that PIRA will crash with a corresponding error, if you define dependency cycles here.
- Note that you can care about dependency management on your own: If not given any
depends-on
PIRA will (try to) load the modules in exactly that order that was given in the config file.
- Each module have to have a
batch-setting
section: The actual hardware and job options for the batch system. Some options in the section are mandatory, you cannot omit this section.time
: Thesbatch --time
option, mandatory.mem-per-cpu
: Thesbatch --mem-per-cpu
option, mandatory.ntasks
: Thesbatch --ntasks
option, mandatory.- You can optionally further specify the following options:
partition
,reservation
,account
(all default tonull
=not given),cpus-per-task
(defaults to4
),exclusive
(defaults totrue
; not supported withpyslurm
), andcpu-freq
(defaults tonull
). - Note that there are some
sbatch
options you cannot define in this config. This is due to some options used by PIRA internally, for example the--array
option, to map the repetitions to job arrays.
How and which version of PySlurm to install on your cluster, highly depends on your SLURM version, and your SLURM installation. The installation and packaging solution for pyslurm with PIRA is work in progress. Refer to their README. You might try some of the following:
- If your SLURM version is XX.YY.*, you want to look for a release/brach of pyslurm with the version XX.YY.* (Note that * does not have to be identical).
- Determine where SLURM is set up, look for a
include
and alib
dir. - Build pyslurm with these options, e.g.
python3 setup.py build --slurm-lib=/opt/slurm/21.08.6/lib --slurm-inc=/opt/slurm/21.08.6/include
- Install it:
python3 setup.py install --slurm-lib=/opt/slurm/21.08.6/lib --slurm-inc=/opt/slurm/21.08.6/include
.