Skip to content

Latest commit

 

History

History
18 lines (11 loc) · 1.24 KB

README.md

File metadata and controls

18 lines (11 loc) · 1.24 KB

Multi-Step Bootstrapping with ReLAx

Example N-step TD3 implementation with ReLAx

The performance versus vanilla 1-step TD is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot:

n_step_vs_1_step

The only difference in hyper-parameters settings between N-step TD3 and vanilla TD3 is the presence of multi-step bootstrapping. We can see a substantial advantage of 3-step version in terms of training speed as well as asymptotic performance by looking at the averaged curves. That shows that often N-step TD is the cheapest way of improving the performance of RL actor. Note that from task to task the incremental performance of using N-step TD may vary. For example, early experiments show that for Mujoco's Ant-v2 environment 3-step Bellman update works worse than 1-step version.

Resulting Policy

3_step_td3.mp4