Application of Reinforcement Learning Methods

This repository implements a set of reinforcement learning algorithms to run on OpenAI gym environments. The algorihtms implemented are:

Relative Entropy Policy Search (REPS)
Actor-Critic Relative Entropy Policy Search (ACREPS)
Proximal Policy Optimization (PPO)

In practice, they were only tested on the following environments:
Pendulum Swingup, Double Cartpole, Furuta Pendulum, Ball Ballancer
The last three are custom gym environments implemented in the quansar_robots repository.

Repository Contents

run.py the main entry point for all training and evaluation tasks.
experiments.py convenience method to run multiple experiments with the same settings in parallel processes.
requirements.txt contains the requirements (except for quanser_robots which needs to be installed manually)
/agents
- acreps.py Actor-Critic Relative Entropy Policy Search
- reps.py Relative Entropy Policy Search
- ppo.py Proximal Policy Optimization
/common common code
/hyperparameters good known hyperparameters for algorithms
/out contains all outputs generated by the training process

Training and Running Models

The main entry point of this repository is the run.py file. It comes with a sophisticated command-line parser and special subparsers for each algorithm implemented in this repository. The basic syntax is:

python run.py [general arguments] (ACREPS|REPS|PPO) [algorithm specific arguments]

More information on required and on optional commands can be found by running python run.py -h. Which returns

usage: run.py [-h] --name NAME --env ENV [--robot] [--seed SEED] [--render]
              [--experiment] [--n_eval_traj N_EVAL_TRAJ] [--eval | --resume]
              {REPS,ACREPS,PPO} ...

Solve the different gym environments

optional arguments:
  -h, --help            show this help message and exit
  --name NAME           identifier to store experiment results
  --env ENV             name of the environment to be learned
  --robot               run the experiment using the real robot environment
  --seed SEED           seed for torch/numpy/gym to make experiments
                        reproducible
  --render              render the environment
  --experiment          whether this experiment was run via experiments.py
  --n_eval_traj N_EVAL_TRAJ
                        number of trajectories to run evaluation on, when
                        --eval is set.
  --eval                toggles evaluation mode
  --resume              resume training on an existing model by loading the
                        last checkpoint

subcommands:
  Algorithms to choose from

  {REPS,ACREPS,PPO}

For information on algorithm specific commands the -h can be executed on the subcommands {ACREPS,REPS,PPO}, e.g. python run.py REPS -h to get more information on training the REPS algorithm.

The most basic command for training REPS on the underactuated pendulum swingup would be

python run.py --name reps_pendulum --env pendulum REPS

Training with Custom Hyperparameter Settings

By default a default set of hyperparameters is loaded from hyperparameters/[algorithm]/[environoment].yaml. However, each of the algorithms' subcommands (REPS|ACREPS|PPO) can take custom hyperparameters in the form of command line arguments. To figure out the available hyperparameters that can be set (for a specific algorithm) the -h flag can be run on the algorithm's subcommand, e.g. python run.py REPS -h returns:

usage: run.py REPS [-h] [--n_epochs N_EPOCHS] [--n_steps N_STEPS]
                   [--epsilon EPSILON] [--gamma GAMMA] [--n_fourier N_FOURIER]
                   [--fourier_band FOURIER_BAND [FOURIER_BAND ...]]

optional arguments:
  -h, --help            show this help message and exit
  --n_epochs N_EPOCHS   number of training epochs
  --n_steps N_STEPS     number of environment steps per epoch
  --epsilon EPSILON     KL constraint.
  --gamma GAMMA         1 minus environment reset probability.
  --n_fourier N_FOURIER
                        number of fourier features.
  --fourier_band FOURIER_BAND [FOURIER_BAND ...]
                        number of fourier features.

Training REPS on the pendulum with a custom gamma and more Fourier features is as easy as

python run.py --name reps_pendulum --env pendulum REPS --gamma 0.9 --n_fourier 200

Resume Training

Experiments can be stopped during the training process and resumed afterwards by using the --resume flag.
NOTE: --resume only loads the checkpoint and does not load previously used hyperparameters! You have to supply these yourself. So to resume the previous example just run

python run.py --name reps_pendulum --env pendulum --resume REPS --gamma 0.9 --n_fourier 200

NOTE: If the run stopped by itself, because the maximum number of epochs was reached it will not start again because the termination criterion has been met. To continue training you have to supply a --n_epochs with a higher number of epochs than what has already been trained.

Evaluating Experiments

Trained Models can be evaluated using the --eval flag. For evaluation, the deterministic policy of the trained model is evaluated for a number of trajectories which can be adjusted using the --n_eval_traj flag.

Evaluation returns the mean trajectory reward, its standard deviation and the maximum trajectory reward. Evaluating the previously trained REPS model on 100 trajectories can be done by running:

python run.py --name reps_pendulum --env pendulum --eval --n_eval_traj 100 REPS

Running on the Real Robot

To set up the real robot one has to follow the instructions in the https://git.ias.informatik.tu-darmstadt.de/quanser/clients repository. Furthermore, in our code, the --robot flag has to be supplied to select the real robot environment variant of the gym environment. Everything else remains the same no matter whether you run in simulation or on the real system.

Visualising Training with TensorBoard

Different scalar values that occur during the training process are saved into tensorboard files automatically. These files are either located in out/summary/[experiment_name] or in out/experiments/[experiment_name]/summary depending on whether the experiment was invoked via run.py or experiments.py. Usually, however, experiments will be located in the first of both locations. To start a tensorboard that loads all experiments invoked via run.py just execute:

tensorboard --logdir=out/summary

Installation

This code was tested with python 3.6.5. All dependencies are listed in requirements.txt.

We suggest running everything from within a conda environment. The following sections are going to address the installation process in a conda environment.

Conda Environment

To manage the different python versions, we use conda for creating virtual environments. Install mini conda from https://conda.io/en/latest/miniconda.html. Next, run the following commands to create an empty environment with the name rl-env and install all required dependencies within that environment.

conda create --name rl-env python=3.6.5
conda activate rl-env
pip install -r requirements.txt

Now you can run our code!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Application of Reinforcement Learning Methods

Repository Contents

Training and Running Models

Training with Custom Hyperparameter Settings

Resume Training

Evaluating Experiments

Running on the Real Robot

Visualising Training with TensorBoard

Installation

Conda Environment

Files

README.md

Latest commit

History

README.md

File metadata and controls

Application of Reinforcement Learning Methods

Repository Contents

Training and Running Models

Training with Custom Hyperparameter Settings

Resume Training

Evaluating Experiments

Running on the Real Robot

Visualising Training with TensorBoard

Installation

Conda Environment