Skip to content

nslyubaykin/relax_ppo_example

Repository files navigation

Example PPO implementation with ReLAx

This repository contains an implementation of proximal policy optimization (PPO) with ReLAx.

PPO actor was trained on Ant-v2 Mujoco Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

ppo_training

Resulting Policy:

ppo_run.mp4