Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] RLTrainer is all you need. #31490

Merged
merged 13 commits into from
Jan 6, 2023

Conversation

kouroshHakha
Copy link
Contributor

@kouroshHakha kouroshHakha commented Jan 6, 2023

Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com

Why are these changes needed?

The RLOptimizer seems to have become this shallow module that just adds to the complexity of the system.
It basically is responsible for two things: 1) defining framework optimizers, 2) containing the loss logic.

defining the optimizer inside this module creates a lot of un-necessary complexities when it comes to multi-agent RLOptimizers. By moving this logic to RLTrainer we can get rid of these complexities.

Since compute_loss is also a state-less function we can easily move that to RLTrainer as well. Now all users have to do is to extend RLTrainer directly to customize the training phase of their algorithm. For example BCOptimizer will now become part of BCRLTrainer's implementation where all I have to do is to optionally override the _configure_optimizer() and write compute_loss. compute_loss will be written as a multi-agent loss. This is where the first-class-ness of MARL comes into play. It will make very complicated MARL communication patterns possible and also extremely easy.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2. multi-gpus tests pass now

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
# rerun make_optimizers to update the params and optimizer
self.make_optimizers()

def make_module(self) -> RLModule:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns a MultiAgentRLModule

Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm comments and type hints being fixed

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@@ -5,19 +5,13 @@
import unittest

import ray
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore everything that is under optim. Since these tests are removed from CI anyway.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@gjoliver
Copy link
Member

gjoliver commented Jan 6, 2023

Test failures do not seem relevant

@gjoliver gjoliver merged commit 4e234b7 into ray-project:master Jan 6, 2023
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
moved rl_optimizer logic into rl_trainer

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
moved rl_optimizer logic into rl_trainer

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants