-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] RLTrainer is all you need. #31490
[RLlib] RLTrainer is all you need. #31490
Conversation
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2. multi-gpus tests pass now Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
rllib/core/rl_trainer/rl_trainer.py
Outdated
# rerun make_optimizers to update the params and optimizer | ||
self.make_optimizers() | ||
|
||
def make_module(self) -> RLModule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns a MultiAgentRLModule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm comments and type hints being fixed
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@@ -5,19 +5,13 @@ | |||
import unittest | |||
|
|||
import ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ignore everything that is under optim. Since these tests are removed from CI anyway.
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Test failures do not seem relevant |
moved rl_optimizer logic into rl_trainer Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
moved rl_optimizer logic into rl_trainer Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com
Why are these changes needed?
The RLOptimizer seems to have become this shallow module that just adds to the complexity of the system.
It basically is responsible for two things: 1) defining framework optimizers, 2) containing the loss logic.
defining the optimizer inside this module creates a lot of un-necessary complexities when it comes to multi-agent RLOptimizers. By moving this logic to RLTrainer we can get rid of these complexities.
Since compute_loss is also a state-less function we can easily move that to RLTrainer as well. Now all users have to do is to extend RLTrainer directly to customize the training phase of their algorithm. For example BCOptimizer will now become part of BCRLTrainer's implementation where all I have to do is to optionally override the
_configure_optimizer()
and writecompute_loss
. compute_loss will be written as a multi-agent loss. This is where the first-class-ness of MARL comes into play. It will make very complicated MARL communication patterns possible and also extremely easy.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.