Skip to content

[Feature Request] TD3 #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vmoens opened this issue Feb 11, 2022 · 0 comments
Closed

[Feature Request] TD3 #18

vmoens opened this issue Feb 11, 2022 · 0 comments
Assignees
Labels
new algo New algorithm request or PR

Comments

@vmoens
Copy link
Contributor

vmoens commented Feb 11, 2022

Implement TD3 algorithm as presented here.

@vmoens vmoens added the new algo New algorithm request or PR label Feb 11, 2022
@vmoens vmoens self-assigned this Feb 11, 2022
@Benjamin-eecs Benjamin-eecs changed the title TD3 [Feature Request] TD3 Jul 21, 2022
vmoens pushed a commit to vmoens/rl that referenced this issue May 5, 2023
* Draft porting of RL training with vanilla pg and gumbel trick

* WIP: working on rl ppo

* add env

* env updates

* Remove breakpoint used for debugging

* Training loop for PPO that doesn't crash

* Clean up dead code

* Change order of compilation / wrapping

* Reward training fixes

* Config changes for testing

* Pad sequences with zeros

* Make reward dir if not exist

* Remove redundant config

* Sort of working end-to-end pipeline

* Slightly more efficient env

---------

Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens pushed a commit to vmoens/rl that referenced this issue May 5, 2023
* Draft porting of RL training with vanilla pg and gumbel trick

* WIP: working on rl ppo

* add env

* env updates

* Remove breakpoint used for debugging

* Training loop for PPO that doesn't crash

* Clean up dead code

* Change order of compilation / wrapping

* Reward training fixes

* Config changes for testing

* Pad sequences with zeros

* Make reward dir if not exist

* Remove redundant config

* Sort of working end-to-end pipeline

* Slightly more efficient env

---------

Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens pushed a commit to vmoens/rl that referenced this issue May 10, 2023
* Draft porting of RL training with vanilla pg and gumbel trick

* WIP: working on rl ppo

* add env

* env updates

* Remove breakpoint used for debugging

* Training loop for PPO that doesn't crash

* Clean up dead code

* Change order of compilation / wrapping

* Reward training fixes

* Config changes for testing

* Pad sequences with zeros

* Make reward dir if not exist

* Remove redundant config

* Sort of working end-to-end pipeline

* Slightly more efficient env

---------

Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens pushed a commit that referenced this issue Jun 19, 2023
* Draft porting of RL training with vanilla pg and gumbel trick

* WIP: working on rl ppo

* add env

* env updates

* Remove breakpoint used for debugging

* Training loop for PPO that doesn't crash

* Clean up dead code

* Change order of compilation / wrapping

* Reward training fixes

* Config changes for testing

* Pad sequences with zeros

* Make reward dir if not exist

* Remove redundant config

* Sort of working end-to-end pipeline

* Slightly more efficient env

---------

Co-authored-by: Tom Begley <tomcbegley@gmail.com>
vmoens pushed a commit that referenced this issue Jun 21, 2023
* Draft porting of RL training with vanilla pg and gumbel trick

* WIP: working on rl ppo

* add env

* env updates

* Remove breakpoint used for debugging

* Training loop for PPO that doesn't crash

* Clean up dead code

* Change order of compilation / wrapping

* Reward training fixes

* Config changes for testing

* Pad sequences with zeros

* Make reward dir if not exist

* Remove redundant config

* Sort of working end-to-end pipeline

* Slightly more efficient env

---------

Co-authored-by: Tom Begley <tomcbegley@gmail.com>
@vmoens vmoens closed this as completed Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new algo New algorithm request or PR
Projects
None yet
Development

No branches or pull requests

1 participant