Parallel Sampling with ReLAx
Speeding Up PPO with Parallel Sampling
This repository contains an implementation of PPO algorithm with sampling from parallel environments with ReLAx package.
The performance of single vs multi-thread sampling:
Parallel Sampling Takeaways:
-
Avoid small batches. If the tasks are very small, the Ray program can take longer than the equivalent Python program. The issue here is that every task invocation has a non-trivial overhead (e.g., scheduling, inter-process communication, updating the system state) and this overhead dominates the actual time it takes to execute the task.
-
Sampling from GPU hosted policy is slower. If possible, the best option is to run sampling phase on CPU to ensure maximum performance, then on a training phase transfer it back to GPU using actor's .set_device() method. If update phase is computationally cheap, it may be justified to run purely on a CPU.
PPO Humanoid Learning Curve:
Each x-axis step corresponds to 30k learning transitions batch.