-
Notifications
You must be signed in to change notification settings - Fork 362
[Feature Request] TD3 #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
new algo
New algorithm request or PR
Comments
vmoens
pushed a commit
to vmoens/rl
that referenced
this issue
May 5, 2023
* Draft porting of RL training with vanilla pg and gumbel trick * WIP: working on rl ppo * add env * env updates * Remove breakpoint used for debugging * Training loop for PPO that doesn't crash * Clean up dead code * Change order of compilation / wrapping * Reward training fixes * Config changes for testing * Pad sequences with zeros * Make reward dir if not exist * Remove redundant config * Sort of working end-to-end pipeline * Slightly more efficient env --------- Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens
pushed a commit
to vmoens/rl
that referenced
this issue
May 5, 2023
* Draft porting of RL training with vanilla pg and gumbel trick * WIP: working on rl ppo * add env * env updates * Remove breakpoint used for debugging * Training loop for PPO that doesn't crash * Clean up dead code * Change order of compilation / wrapping * Reward training fixes * Config changes for testing * Pad sequences with zeros * Make reward dir if not exist * Remove redundant config * Sort of working end-to-end pipeline * Slightly more efficient env --------- Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens
pushed a commit
to vmoens/rl
that referenced
this issue
May 10, 2023
* Draft porting of RL training with vanilla pg and gumbel trick * WIP: working on rl ppo * add env * env updates * Remove breakpoint used for debugging * Training loop for PPO that doesn't crash * Clean up dead code * Change order of compilation / wrapping * Reward training fixes * Config changes for testing * Pad sequences with zeros * Make reward dir if not exist * Remove redundant config * Sort of working end-to-end pipeline * Slightly more efficient env --------- Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens
pushed a commit
that referenced
this issue
Jun 19, 2023
* Draft porting of RL training with vanilla pg and gumbel trick * WIP: working on rl ppo * add env * env updates * Remove breakpoint used for debugging * Training loop for PPO that doesn't crash * Clean up dead code * Change order of compilation / wrapping * Reward training fixes * Config changes for testing * Pad sequences with zeros * Make reward dir if not exist * Remove redundant config * Sort of working end-to-end pipeline * Slightly more efficient env --------- Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens
pushed a commit
that referenced
this issue
Jun 21, 2023
* Draft porting of RL training with vanilla pg and gumbel trick * WIP: working on rl ppo * add env * env updates * Remove breakpoint used for debugging * Training loop for PPO that doesn't crash * Clean up dead code * Change order of compilation / wrapping * Reward training fixes * Config changes for testing * Pad sequences with zeros * Make reward dir if not exist * Remove redundant config * Sort of working end-to-end pipeline * Slightly more efficient env --------- Co-authored-by: Tom Begley <tomcbegley@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Implement TD3 algorithm as presented here.
The text was updated successfully, but these errors were encountered: