This repository implements a set of reinforcement learning algorithms to run on OpenAI gym environments. The algorihtms implemented are:
- Relative Entropy Policy Search (REPS)
- Actor-Critic Relative Entropy Policy Search (ACREPS)
- Proximal Policy Optimization (PPO)
In practice, they were only tested on the following environments:
Pendulum Swingup
, Double Cartpole
, Furuta Pendulum
, Ball Ballancer
The last three are custom gym environments implemented in the quansar_robots repository.
run.py
the main entry point for all training and evaluation tasks.experiments.py
convenience method to run multiple experiments with the same settings in parallel processes.requirements.txt
contains the requirements (except forquanser_robots
which needs to be installed manually)/agents
acreps.py
Actor-Critic Relative Entropy Policy Searchreps.py
Relative Entropy Policy Searchppo.py
Proximal Policy Optimization
/common
common code/hyperparameters
good known hyperparameters for algorithms/out
contains all outputs generated by the training process
The main entry point of this repository is the run.py
file. It comes with a sophisticated command-line parser and
special subparsers for each algorithm implemented in this repository.
The basic syntax is:
python run.py [general arguments] (ACREPS|REPS|PPO) [algorithm specific arguments]
More information on required and on optional commands can be found by running python run.py -h
.
Which returns
usage: run.py [-h] --name NAME --env ENV [--robot] [--seed SEED] [--render]
[--experiment] [--n_eval_traj N_EVAL_TRAJ] [--eval | --resume]
{REPS,ACREPS,PPO} ...
Solve the different gym environments
optional arguments:
-h, --help show this help message and exit
--name NAME identifier to store experiment results
--env ENV name of the environment to be learned
--robot run the experiment using the real robot environment
--seed SEED seed for torch/numpy/gym to make experiments
reproducible
--render render the environment
--experiment whether this experiment was run via experiments.py
--n_eval_traj N_EVAL_TRAJ
number of trajectories to run evaluation on, when
--eval is set.
--eval toggles evaluation mode
--resume resume training on an existing model by loading the
last checkpoint
subcommands:
Algorithms to choose from
{REPS,ACREPS,PPO}
For information on algorithm specific commands the -h
can be executed on the subcommands {ACREPS,REPS,PPO}
,
e.g. python run.py REPS -h
to get more information on training the REPS algorithm.
The most basic command for training REPS on the underactuated pendulum swingup would be
python run.py --name reps_pendulum --env pendulum REPS
By default a default set of hyperparameters is loaded from hyperparameters/[algorithm]/[environoment].yaml
.
However, each of the algorithms' subcommands (REPS|ACREPS|PPO)
can take custom hyperparameters in the form of command line arguments.
To figure out the available hyperparameters that can be set (for a specific algorithm) the -h
flag can be run on the algorithm's subcommand,
e.g. python run.py REPS -h
returns:
usage: run.py REPS [-h] [--n_epochs N_EPOCHS] [--n_steps N_STEPS]
[--epsilon EPSILON] [--gamma GAMMA] [--n_fourier N_FOURIER]
[--fourier_band FOURIER_BAND [FOURIER_BAND ...]]
optional arguments:
-h, --help show this help message and exit
--n_epochs N_EPOCHS number of training epochs
--n_steps N_STEPS number of environment steps per epoch
--epsilon EPSILON KL constraint.
--gamma GAMMA 1 minus environment reset probability.
--n_fourier N_FOURIER
number of fourier features.
--fourier_band FOURIER_BAND [FOURIER_BAND ...]
number of fourier features.
Training REPS on the pendulum with a custom gamma and more Fourier features is as easy as
python run.py --name reps_pendulum --env pendulum REPS --gamma 0.9 --n_fourier 200
Experiments can be stopped during the training process and resumed afterwards by using the --resume
flag.
NOTE: --resume
only loads the checkpoint and does not load previously used hyperparameters! You have to supply these yourself.
So to resume the previous example just run
python run.py --name reps_pendulum --env pendulum --resume REPS --gamma 0.9 --n_fourier 200
NOTE: If the run stopped by itself, because the maximum number of epochs was reached it will not
start again because the termination criterion has been met. To continue training
you have to supply a --n_epochs
with a higher number of epochs than what has already been trained.
Trained Models can be evaluated using the --eval
flag.
For evaluation, the deterministic policy of the trained model is evaluated for a number of
trajectories which can be adjusted using the --n_eval_traj
flag.
Evaluation returns the mean trajectory reward, its standard deviation and the maximum trajectory reward. Evaluating the previously trained REPS model on 100 trajectories can be done by running:
python run.py --name reps_pendulum --env pendulum --eval --n_eval_traj 100 REPS
To set up the real robot one has to follow the instructions in the https://git.ias.informatik.tu-darmstadt.de/quanser/clients
repository.
Furthermore, in our code, the --robot
flag has to be supplied to select the
real robot environment variant of the gym environment.
Everything else remains the same no matter whether you run in simulation or on the real system.
Different scalar values that occur during the training process are saved into tensorboard files automatically.
These files are either located in out/summary/[experiment_name]
or in out/experiments/[experiment_name]/summary
depending on whether the experiment was invoked via run.py
or experiments.py
. Usually, however, experiments will
be located in the first of both locations.
To start a tensorboard that loads all experiments invoked via run.py
just execute:
tensorboard --logdir=out/summary
This code was tested with python 3.6.5
.
All dependencies are listed in requirements.txt
.
We suggest running everything from within a conda environment. The following sections are going to address the installation process in a conda environment.
To manage the different python versions, we use conda for creating virtual environments.
Install mini conda from https://conda.io/en/latest/miniconda.html.
Next, run the following commands to create an empty environment with the name rl-env
and install all required dependencies within that environment.
conda create --name rl-env python=3.6.5
conda activate rl-env
pip install -r requirements.txt
Now you can run our code!