Skip to content

Replication package for the paper "μPRL: a Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults" ICSE 2025.

License

Notifications You must be signed in to change notification settings

testingautomated-usi/muPRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1. Setting up the environment

1.1 Download the docker image

If you do not have an NVIDIA GPU type:


docker run -it -u ${UID} --rm --mount type=bind,source="$(pwd)",target=/home/muPRL --workdir /home/muPRL --name muPRL-container dockercontainervm/muprl-cpu:0.1.0

The command will download the container image dockercontainervm/muprl-cpu:0.1.0 that should be around 2.6 GB.

If you have an NVIDIA GPU, make sure to install the Nvidia Container Toolkit and type:


docker run --gpus all -it -u ${UID} --rm --mount type=bind,source="$(pwd)",target=/home/muPRL --workdir /home/muPRL --name muPRL-container dockercontainervm/muprl-gpu:0.1.0

The command will download the container image dockercontainervm/muprl-gpu:0.1.0 that should be around 8 GB.

1.2 (Optional): Build the docker container instead of step 1.1

If you do not have an NVIDIA GPU type:


docker build -f Dockerfile_CPU -t muprl:latest .
docker run -it -u ${UID} --rm --mount type=bind,source="$(pwd)",target=/home/muPRL --workdir /home/muPRL --name muPRL-container muprl:latest

The image size should be around 2.6 GB.

If you have an NVIDIA GPU, make sure to install the Nvidia Container Toolkit and type:


docker build -f Dockerfile_GPU -t muprl:latest .
docker run --gpus all -it -u ${UID} --rm --mount type=bind,source="$(pwd)",target=/home/muPRL --workdir /home/muPRL --name muPRL-container muprl:latest

The image size should be around 8 GB.

1.3 (Optional): Use VSCode Devcontainer instead of step 1.1

  • Download VSCode for your platform;
  • Install DevContainer Extension;
  • In VSCode, use the Command Palette (Ctrl+Shift+P or Cmd+Shift+P on macOS) to run the "Dev Containers: Open Folder in Container..." command;
  • You will be prompted with two options: CPU dev container or GPU dev container. Choose the one you want to run.

2. Mutation testing pipeline

An example with the CartPole environment (the testing machine has a QUADRO GPU with 8GB of memory). For simplicity, we only execute one run (instead of ten) and sample three mutant configurations (instead of five):

2.1 Train original model


./train.sh --env-name CartPole-v1 --algo-name ppo --type original --num-runs 5

The previous command trains five cartpole models sequentially (the hyperparameters are in hyperparams/ppo.yml). To train in parallel pass --parallelize true; if you are training on the GPU, prepend the command with export XLA_PYTHON_CLIENT_PREALLOCATE=false such that Jax allocates the GPU memory dynamically (otherwise, by default Jax allocates 75% of the GPU to a single process, resulting in an OOM error). With --parallelize=true it takes ~7 minutes.

2.2 Train the gamma mutant


./train.sh --env-name CartPole-v1 --algo-name ppo --type mutant --mutant-name gamma --search-budget 3 --seed 0

Trains the gamma mutant (e.g., values sampled randomly 3 times) for the same number of runs using the same configurations). The seed is to make the run repeatable (it should generate 0.9, 0.45, 0.7). As before, the --parallelize=true commandline argument, parallelizes the five runs for each sampled mutant configuration. With --parallelize=true it takes ~16 minutes.

2.3 Replay the training configurations


./test.sh --env-name CartPole-v1 --algo-name ppo --mode replay --mutant-name gamma --delete-mutants true

Replays the training configurations for both the original and the mutated agents if configurations were already replayed, it will not repeat them). It keeps only the mutant configuration that is killable and closer to the original value of the mutation operator (i.e., 0.9 in the example). It takes ~15 minutes.

2.4 Build weaker test generator


./test.sh --env-name CartPole-v1 --algo-name ppo --mode weak --mutant-name gamma --num-episodes 200 --seed 0

Samples 200 configurations at random and execute them (if configurations are already executed, it will not repeat them). It takes ~5 minutes.


python -m test_generation.build_weaker_test_set --env-name CartPole-v1 --algo ppo --num-configurations-to-select 50

Selects 50 weaker configurations.


./test.sh --env-name CartPole-v1 --algo-name ppo --mode weaker --mutant-name gamma --num-episodes 50

Executes the weaker test set. It takes ~50 seconds.

2.5 Build strong test generator


./train_strong_test_generator.sh --env-name CartPole-v1 --algo-name ppo --seed 0 --num-epochs 500 --learning-rate 3e-4 \
    --batch-size 256 --early-stopping-patience 10 --num-layers 3 --testing-configurations true --mutant-name gamma

Trains the failure predictor (CPU only supported). It takes ~30 seconds.


./test.sh --env-name CartPole-v1 --algo-name ppo --mode strong --mutant-name gamma --num-episodes 100 --seed 0 --num-layers 3 --testing-configurations true

Executes the strong test generator with trained predictor for 100 episodes. It takes ~5 minutes.

2.6 Analyze the results


python evaluate.py --env CartPole-v1 --algo ppo --triv 1 --ms 1 --sens 1 --num-runs 5

The logs are written into ./mutation_analysis_results/ppo/CartPole-v1. In the sensitivity_weak_strong_weaker.csv file, we report the mutation scores of weaker and strong test generators (the mutation score of the weak test generator is only used as a reference). In this case we have that the mutation score of the weaker test generator is 0.4, while the mutation score of the strong test generator is 0.6; hence the sensitivity is 0.333 as shown in the last row.

3. Mutation operators implemented in muPRL

3.1 List of mutation operators

Mutation operator PPO DQN SAC TQC
ent_coef (SEC) ✔️
exploration_final_eps (SMR) ✔️
gamma (SDF) ✔️ ✔️ ✔️ ✔️
learning_starts (SLS) ✔️ ✔️ ✔️
n_steps (SNR) ✔️
n_timesteps (NEI) ✔️ ✔️ ✔️ ✔️
target_update_interval (SNU) ✔️
tau (SPV) ✔️ ✔️

3.2 Search space

Mutation operator Hyperparameter search space Mutation search space
ent_coef (SEC) float, (0.00000001, 0.1) float, (0.01, 0.2), We changed the lower bound to reduce the search space, rounding to two floating point digits.
exploration_final_eps (SMR) float, (0.0, 0.2) float, (0.0, 1.0), 1.0 is the theoretical maximum for this parameter.
gamma (SDF) [0.9, 0.95, 0.98, 0.99, 0.995, 0.999, 0.9999] [0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 0.99, 0.995, 0.999, 0.9999], 0.45 is 50% less than the lowest value 0.9 in the original space. The theoretical minimum is 0.0, i.e., no discounting. The theoretical maximum is 1.0.
learning_starts (SLS) [0, 1000, 5000, 10000, 20000] [0.5, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50], Transformed into percentages of training time steps; upper bound is 50% of the original training time steps.
n_steps (SNR) [8, 16, 32, 64, 128, 256, 512, 1024, 2048] [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 3072], 2 is 75% less than the lowest value 8 in the original space, and 3072 is 50% than the highest value 2048 in the original space. The theoretical minimum is 1.
n_timesteps (NEI) N/A [10 -- 95, 5], From 10% to 95% of the original training time steps, with a step size of 5%.
target_update_interval (SNU) [1, 1000, 5000, 10000, 15000, 20000] [0.5, 1, 5, 10, 15, 20, 25, 30], Transformed into percentages of training time steps; upper bound is 30% of the original training time steps.
tau (SPV) [0.001, 0.005, 0.01, 0.02, 0.05, 0.08] [0.0005, 0.0008, 0.001, 0.005, 0.01, 0.02, 0.05, 0.08, 0.1, 0.12], 0.12 is 50% more than 0.08 (i.e., highest value in original space); 0.0005 is 50% less than 0.001 (i.e., lowest value in original space).

4. Trained agents

We share the agents for each environment as well as the associated killable and non-trivial mutants with the results mutation analysis. The data are on Figshare.

About

Replication package for the paper "μPRL: a Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults" ICSE 2025.

Resources

License

Stars

Watchers

Forks

Packages

No packages published