This repository contains the RedBench benchmark for the evaluation of test case reduction techniques. It accompanies the research paper "P. Kreutzer, T. Kunze, M. Philippsen: Test Case Reduction: A Framework, Benchmark, and Comparative Study" published at ICSME'21.
RedBench includes 321 fuzzer-generated C and SMT-LIB 2 test cases that trigger different bugs in real compilers (see below for some detailed statistics), as well as an automated execution environment based on Docker to evaluate each test program with the respective compiler.
In addition, this repository also contains helper scripts to reduce all benchmark programs with the reducers contained in our RedPEG framework. This allows the replication of the reduction results from our research paper (but this repository also includes the final results for convenience, see below).
The RedBench programs have been generated with the following compiler fuzzers from the scientific literature:
- Brummayer, R., Biere, A.: Fuzzing and Delta-Debugging SMT Solvers. In: SMT’09: International Workshop on Satisfiability Modulo Theories (Montreal, Canada, Aug. 2009), 1–5.
- Kreutzer, P., Kraus, S., Philippsen, M.: Language-Agnostic Generation of Compilable Test Programs. In: ICST’20: International Conference on Software Testing, Verification and Validation (Virtual, Oct. 2020), 39–50.
- Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and Understanding Bugs in C Compilers. In: PLDI’11: Programming Language Design and Implementation (San Jose, CA, Jun. 2011), 283–294.
If you want to cite RedBench, please cite our ICSME'21 research paper:
- Kreutzer, P., Kunze, T., Philippsen, M.: Test Case Reduction: A Framework, Benchmark, and Comparative Study. In: ICSME'21: International Conference on Software Maintenance and Evolution (Virtual, Luxembourg, Sep. 2021), 58–69.
RedBench currently contains 321 failure-inducing programs:
language | fuzzer | #progs | min. size | med. size | max. size |
---|---|---|---|---|---|
C | Csmith | 122 | 1.0 KiB | 113.7 KiB | 430.6 KiB |
*Smith | 128 | 3.0 KiB | 128.9 KiB | 910.5 KiB | |
SMT-LIB 2 | FuzzSMT | 26 | 1.2 KiB | 4.5 KiB | 53.4 KiB |
*Smith | 45 | 0.9 KiB | 12.2 KiB | 99.7 KiB |
When designing the benchmark, we made sure to include programs of varying size; please refer to the size distributions of the C test cases and the SMT-LIB 2 test cases for more details.
The RedBench programs trigger 110 different bugs in 19 different versions of 5 real compilers:
language | compiler | versions | #bugs | #progs |
---|---|---|---|---|
C | GCC | 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0 | 47 | 134 |
LLVM | 1.9, 2.0, 2.1, 2.2 | 42 | 116 | |
SMT-LIB 2 | Yices | 2.2.0, 2.3.0, 2.4.0, 2.6.0 | 7 | 22 |
z3 | 4.4.0 | 2 | 7 | |
CVC4 | 1.4, 1.5, 1.6, 1.7, 1.8 | 12 | 42 |
To keep the benchmark diverse, we included at most 4 programs (of different size) per bug and fuzzer.
RedBench has been developed and tested on Debian 10. If you want to use RedBench for your own experiments, the following (Debian) packages are required:
docker.io
python3-jinja2
python3-yaml
The instructions below assume that these packages have been installed.
This repository is structured as follows:
- The
tools/
subdirectory contains some command line tools that are either required for setting up RedBench or simplify its use, see below. - The
docker/
subdirectory contains the Docker based execution environment that includes the different compiler versions targeted by the RedBench programs. The instructions below explain how to build and use this environment. - The
testsuite/
subdirectory contains the benchmark programs that RedBench consists of, separated by language. Each test case is augmented with some metadata that precisely describes the bug that it triggers. The actual test functions that check whether a program (or reduction candidate) triggers the bug are automatically generated based on this metadata. The instructions below explain how to generate the test functions (as well es several helper scripts to evaluate each test program in the respective Docker container). This directory also contains the reduction results from our comparative study. - The
reduction/
subdirectory contains helper scripts for running the reducers in the RedPEG framework on the RedBench test cases, see below.
This repository contains some command line tools in the tools/
subdirectory. These tools are
either required for setting up RedBench or simplify its use:
check_testcases.sh
: Checks for each test program in the given path whether it really triggers the specified bug in the respective compiler (the checks can be repeated multiple times to check for deviating results in case of non-deterministic bugs). This requires that the execution environment has been set up correctly (see below) and that the test functions have been generated (see below).dq.py
: RedBench makes have use of YAML files (e.g., to store the metadata for each test case). Thedq.py
command line tool queries such YAML files for specific fields; its output is then further processed by other tools.j2.py
: RedBench generates several files from Jinja2 templates (e.g., theDockerfile
s for the execution environment or the test functions). Thej2.py
command line tool expands such templates.label_bugs.py
: Assigns ascending bug IDs for the different bugs; only needed when new programs/bugs are added to RedBench.link.sh
: Creates symbolic links to further structure and categorize the test cases (see below).remove_old_results.sh
: Removes all reduction results in the given path that are not marked as thelatest
results (see below).
In a nutshell, setting up RedBench consists of two steps: (1) building the execution environment
and (2) generating the test functions (and helper scripts) of the test suite. The following
instructions give some more details on the execution environment and the
test suite and we highly recommend reading them, but if you really want to skip the
details, simply type make
in the root directory of the RedBench repository . Alternatively, run
make docker
(to only build the execution environment) or make testsuite
(to only generate the
test functions and helper scripts).
Warning: building the execution environment might take quite long (expect multiple hours).
As indicated above, RedBench provides a Docker based execution environment that includes the
different compiler versions that the test programs target. The docker/
subdirectory contains the
necessary files. Note: At first glance, the execution environment might seem somewhat
complicated, but we had extensibility and maintainability in mind when constructing it. To achieve
these goals, the execution environment uses Jinja2 templates that are expanded with data from YAML
files. The following instructions explain in more detail how everything works together.
Currently, there are three different groups of images: base
, c
, and smt2
. The base
images
provide a basic execution environment (e.g., they include an OpenJDK installation that is required
to run the RedPEG reducers); RedBench currently uses
different Debian and Ubuntu versions for these base
images. The c
and smt2
images are built
upon the base
images and add the different versions of the C and SMT-LIB 2 compilers that the
RedBench test programs target.
The Jinja2 templates are contained in the _templates
subdirectories. There are template files for
the different Dockerfile
s (from which the actual Docker images are built) and for several helper
scripts (which are used for building the Docker images and for running the Docker containers).
YAML files describe the different versions that should be built and include additional information
that is required to build these versions. For example, the YAML files base/debian/data.yml
and
base/ubuntu/data.yml
contain the necessary information for the Debian and Ubuntu base
images,
whereas the YAML file c/gcc/data.yml
contains the data for the different versions of the GCC C
compiler.
From a technical point of view, building the Docker images for all versions of a compiler (or base
image) consists of the following steps:
- The
build_all.sh
andbuild.sh
helper scripts are generated from Jinja2 templates. - The
build_all.sh
script is executed. It reads the data from the YAML file for this compiler (orbase
image) and runs the generatedbuild.sh
script for each specified version. - For each version, the
build.sh
script uses the additional information provided in the YAML file to generate aDockerfile
from a Jinja2 template. It then builds a Docker image from thisDockerfile
. (Thus, each compiler version results in its own Docker image.)- Note: Each
Dockerfile
includes steps to download the respective compiler version from its official website; we added some checks that try to ensure the integrity of the downloads, but we are not responsible in any way for the downloaded files!
- Note: Each
To simplify this process, we provide make
targets. Run make docker
in the root directory of the
RedBench repository (or simply make
in the docker/
subdirectory) to build all versions of all
compilers. To only build all versions of a single compiler, run make <language>/<compiler>/docker
in the docker/
subdirectory (e.g., run make c/gcc/docker
to build
all versions of the GCC C compiler). Note: Before the different versions of a compiler can be
built, all base images have to be built (but the Makefile
should handle these dependencies
automatically).
Warning: building the execution environment might take quite long (expect multiple hours).
When the Docker images have been built as described above, there are several helper scripts for running each compiler version in a Docker container:
- The
docker/run_container.sh
script is the most generic one and is meant for interactive use. It takes the compiler name and version as command line arguments. For example,./run_container.sh gcc 4.0.0
starts a new Docker container for GCC 4.0.0 and spawns a new shell in it. This script also provides means for copying files to and from the container:- To copy files to the container, specify the source path on the host as third command line
argument. Files that are copied to the container can be found in its
/data
directory. - To copy files from the container once it has finished, specify the target path on the host
as fourth command line argument. Files that are contained in the container's
/output
or/output_tmpfs
directory (the latter one uses a tmpfs) are copied to the host.
- To copy files to the container, specify the source path on the host as third command line
argument. Files that are copied to the container can be found in its
- Each compiler directory contains a script
run_compiler.sh
that starts a new Docker container for the given compiler version and runs it on the given program. For example,./c/gcc/run_compiler.sh 4.0.0 <program>
runs GCC 4.0.0 on the given program (where<program>
is the path to the input program on the host). All additional command line arguments that are passed to this script are passed on to the compiler running in the Docker container.- Note: Depending on your use case, you probably do not have to run these
run_compiler.sh
scripts manually. Each program of the test suite (see below) comes with a scriptrun_docker_exec.sh
that automatically runs the correspondingrun_compiler.sh
script with proper arguments.
- Note: Depending on your use case, you probably do not have to run these
- Each compiler directory also contains a script
run_test.sh
that runs a test function of the test suite in a new Docker container (it returns with the test function's exit code). Like therun_compiler.sh
scripts, therun_test.sh
scripts first take the compiler version as command line argument. Then, they either take a path to the directory of a test case or a pair of paths for a test function and an input program.- Note: Depending on your use case, you probably do not have to run these
run_test.sh
scripts manually. Each program of the test suite (see below) comes with a scripttest_docker_exec.sh
that automatically runs the correspondingrun_test.sh
script with proper arguments.
- Note: Depending on your use case, you probably do not have to run these
Note: There are additional scripts for running reducers in a Docker container, see below.
The testsuite/
subdirectory contains the benchmark programs that RedBench consists of, separated
by language. Each test case is located in its own subdirectory in <language>/testcases
(where
<language>
is either c
for the C test cases or smt2
for the SMT-LIB 2 test cases). Each test
case consists of the original (unreduced) program prog.<language>
and a YAML file data.yml
that
contains the metadata.
As indicated above, the test functions that check whether a program (or reduction candidate)
triggers the bug in the compiler under test are automatically generated based on the metadata for
each test case. The test functions are generated from Jinja2 templates contained in the _templates
subdirectories. For example, the file c/_templates/test.sh.j2
contains the template for the test
functions of the C test cases.
To generate the test functions, run make testsuite
in the root directory of the RedBench
repository (or simply run make
in the testsuite
subdirectory). Note that the test suite can be
built without building the execution environment (but of course you cannot run the test functions in
the Docker containers without building the execution environment first).
When the test suite has been built, there are additional scripts for each test case:
test.sh
: This is the test function, which has to be executed in the proper Docker container. It takes the path to a program as command line argument and returns with exit code1
if this program triggers the bug in the compiler under test (otherwise, it returns with exit code0
).test_docker_exec.sh
: This script is meant to be run on the host. It starts a new Docker container with the proper compiler version and executes the test function in it (it uses therun_test.sh
scripts explained above). It returns with the same exit code as the test function in the container. Note: You can optionally pass a path to a program on the host as command line option; in this case, the test function is applied to this program instead of the original (unreduced) one (this might be handy for testing if a reduction result really still triggers the bug).run_docker_exec.sh
: This script is also meant to be run on the host. It starts a new Docker container with the proper compiler version and runs it on the test program (it uses therun_compiler.sh
scripts explained above). It returns with the same exit code as the compiler in the container.
In addition, the make
target also generates a directory structure with symbolic links to the test
case directories that sorts the test cases based on several criteria:
<language>/by_bug_id
: Sorts the test cases by the bug that they trigger; contains a subdirectory for each different bug.<language>/by_compiler
: Sorts the test cases by compiler and compiler version.<language>/by_generator
: Sorts the test cases by the fuzzer that generated them.<language>/by_kind
: Sorts the test cases into crashes and wrong results.<language>/by_size
: Sorts the test cases by size.
We provide some helper scripts for running our RedPEG framework (which includes fine-tuned implementations of state-of-the-art test case reduction techniques) on the RedBench programs. This allows the replication of the reduction results from our research paper (but this repository also includes the final results for convenience, see below).
The RedBench repository contains the RedPEG framework as a submodule (located in
reduction/RedPEG
); this submodule has to be set up correctly before the RedPEG reducers can be
run. To do so, simply run make RedPEG
in the root directory of the RedBench repository (this
clones the RedPEG repository and builds the RedPEG framework).
Note: Of course, running the RedPEG reducers also requires that the execution environment and test suite have been built, see above.
To run one or more RedPEG reducers on one or more RedBench programs, run the run_RedPEG.sh
script in the reduction/
subdirectory. It takes the following command line arguments:
- Required: Path to the directory that contains the programs that should be reduced. The script
automatically determines all test cases in the given directory and reduces them one after another.
Thus, the path can either point to a single test case directory (e.g.,
testsuite/c/testcases/fold-const_c_8943_117K/
) or a directory that includes multiple test cases (this also supports symbolic links; e.g., providetestsuite/c/by_generator/starsmith/
to reduce all C programs that have been generated with the *Smith compiler fuzzer). - Optional: The name of the reduction run, which is used to determine the output path for the
reduction results. After a (successful) reduction, the reduction results can be found in
testsuite/<language>/testcases/<test case>/reduction/<reduction name>
. Note that a reduction is skipped if the output path already exists. If this command line argument is not provided, the name of the reduction run is automatically set based on the current date and time. - Optional: Names of the reducers that should be run (see the RedPEG repository for more details). If no reducers are given, the reducers from the comparative study in our research paper are executed.
Also note the following:
- The directory that contains the reduction results for each test case (i.e.,
testsuite/<language>/testcases/<test case>/reduction/
) also contains a symbol linklatest
that points to the latest reduction results and that is automatically updated after a (successful) reduction run. This symbolic link allows to access the latest reduction results, independent of their name. - The
run_RedPEG.sh
script runs the RedPEG reducers with the--cache
command line option to enable test outcome caching (see our research paper for more details). - Under the hood, the
run_RedPEG.sh
script uses the genericrun_reducer.sh
script that can also be used for running other reducer implementations (see below).
The script run_reducer.sh
in the reduction/
subdirectory is a generic helper script that can
execute (more or less) arbitrary reducer implementations in a Docker container of the execution
environment. It takes the following command line arguments (also see the run_RedPEG.sh
script for
an example on how to use this script):
- Required: Path to a single test case directory (in contrast to the
run_RedPEG.sh
script, this script only handles one test case at a time). The test case directory is copied to/data
in the Docker container. - Required: Path to the directory that contains the reducer implementation. This directory is
copied to
/reducer
in the Docker container. - Required: Command line that should be executed in the Docker container to run the reducer
implementation. The reducer should write its results to
/output
or/output_tmpfs
(the latter uses a tmpfs); only the files in these directories are copied back to the host after the reducer has terminated (and only if the reducer has terminated successfully with exit code0
). - Optional: The name of the reduction run, which is used to determine the output path for the
reduction results. After a (successful) reduction, the reduction results can be found in
testsuite/<language>/testcases/<test case>/reduction/<reduction name>
. Note that a reduction is skipped if the output path already exists. If this command line argument is not provided, the name of the reduction run is automatically set based on the current date and time.
Also note that this script sets the latest
symbolic link after a successful reduction, see
above.
As indicated above, this repository also contains the reduction results from our comparative study that we presented in our research paper:
Whenever the benchmark is updated (i.e., when programs are added or removed) or new reduction
results are added, make update
should be run in the root directory of the RedBench repository.
This ensures that all statistics are updated (including the ones in the README.md
) and that the
plots for all new reduction results are generated.
Note: running make update
might take a while.
RedBench is licensed under the terms of the MIT license (see LICENSE.mit).
The python scripts contained in this repository make use of the following open-source projects (but they have to be installed manually, see above):
- Jinja (licensed under the terms of the BSD License)
- PyYAML (licensed under the terms of the MIT License)