Parallel Sampling with ReLAx

Speeding Up PPO with Parallel Sampling

This repository contains an implementation of PPO algorithm with sampling from parallel environments with ReLAx package.

The performance of single vs multi-thread sampling:

Parallel Sampling Takeaways:

Avoid small batches. If the tasks are very small, the Ray program can take longer than the equivalent Python program. The issue here is that every task invocation has a non-trivial overhead (e.g., scheduling, inter-process communication, updating the system state) and this overhead dominates the actual time it takes to execute the task.
Sampling from GPU hosted policy is slower. If possible, the best option is to run sampling phase on CPU to ensure maximum performance, then on a training phase transfer it back to GPU using actor's .set_device() method. If update phase is computationally cheap, it may be justified to run purely on a CPU.

PPO Humanoid Learning Curve:

Each x-axis step corresponds to 30k learning transitions batch.

Resulting Policy

parallel_ppo.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
content/video		content/video
images		images
tensorboard_logs/parallel_ppo_Humanoid-v2		tensorboard_logs/parallel_ppo_Humanoid-v2
trained_models		trained_models
README.md		README.md
parallel_ppo.ipynb		parallel_ppo.ipynb