Example PPO implementation with ReLAx

This repository contains an implementation of proximal policy optimization (PPO) with ReLAx.

PPO actor was trained on Ant-v2 Mujoco Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

Resulting Policy:

ppo_run.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
content/video		content/video
tensorboard_logs/ppo_ant_lr2e4		tensorboard_logs/ppo_ant_lr2e4
trained_models		trained_models
README.md		README.md
ppo_example.ipynb		ppo_example.ipynb
ppo_training.png		ppo_training.png

Provide feedback