[Feature Request] TD3 #18

vmoens · 2022-02-11T17:45:48Z

Implement TD3 algorithm as presented here.

* Draft porting of RL training with vanilla pg and gumbel trick * WIP: working on rl ppo * add env * env updates * Remove breakpoint used for debugging * Training loop for PPO that doesn't crash * Clean up dead code * Change order of compilation / wrapping * Reward training fixes * Config changes for testing * Pad sequences with zeros * Make reward dir if not exist * Remove redundant config * Sort of working end-to-end pipeline * Slightly more efficient env --------- Co-authored-by: Tom Begley <tomcbegley@gmail.com>

vmoens added the new algo New algorithm request or PR label Feb 11, 2022

vmoens self-assigned this Feb 11, 2022

Benjamin-eecs changed the title ~~TD3~~ [Feature Request] TD3 Jul 21, 2022

vmoens mentioned this issue Oct 3, 2022

[DO NOT CLOSE] Call for contributions #509

Open

36 tasks

vmoens closed this as completed Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] TD3 #18

[Feature Request] TD3 #18

vmoens commented Feb 11, 2022

[Feature Request] TD3 #18

[Feature Request] TD3 #18

Comments

vmoens commented Feb 11, 2022