This is a PyTorch implementation of PPO algorithm, which is designed for flexible modification and high performance in continuous control tasks.
- pytorch 1.4.0
- tensorboard
- numpy
- tqdm
- gym
- baselines
- pybullet (optional)
You can use the provided requirements.txt
file to install necessary dependencies.
$ pip install -r requirements.txt
For example, to train a ppo agent using 12 processes for pybullet ant locomotion task as follows:
$ python train.py --task-id=AntBulletEnv-v0 --num-processes=12 --num-env-steps=5000000
You can also monitor the training process and perform hyper-parameters tuning using tensorboard:
$ tensorboard --logdir=log
Here is what it looks like:
reward | action |
---|---|
It takes about half an hour for 5M training steps in a six cores MacBook Pro.
HalfCheetahBulletEnv | AntBulletEnv |
---|---|
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. CoRR, abs/1506.02438, 2015.
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.