[RLlib] Fixes the recreation of optimizers when `add_module` is used #31511

kouroshHakha · 2023-01-07T01:15:06Z

Why are these changes needed?

In RLTrainer when we called add_module we used to re-create the optimizer object which would have wiped out the state of the modules that already exist under the trainer. This PR solves that.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

2. multi-gpus tests pass now Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

avnishn

some comments about the placement of functions. Maybe we don't absolutely need to enforce private functions by using double underscore?

avnishn · 2023-01-08T22:21:58Z

rllib/core/rl_trainer/rl_trainer.py

    ) -> None:
        """Add a module to the trainer.

        Args:
            module_id: The id of the module to add.
            module_cls: The module class to add.
            module_kwargs: The config for the module.
+            set_optimizer_fn: A function that takes in the module and returns a list of


It's a complex behavior to be finding the first added optimizer in the optimizer_to_param dictionary. We should either have the user explicitly define a default for us, or pass one themselves.

This is going to create a lot of cognitive load for the user in figuring out which optimizer class was used to create the agent. Of course unless the optimizer that was used is the same for all the modules, which is very likely.

Any automated inference here would put a cognitive load on the user, and we need to clarify that in the docstring. I think being explicit is good here as well. Just asking the user to say what optimizer they want to use for the new added module. That's also easy to specify. I am gonna go with being fully explicit here.

ok fixed this by introducing an extra optional parameter called optimizer_cls. Users have to either provide optimizer_cls for the new parameters or define a function that returns both optimizers and their corresponding parameters with more flexibility.

avnishn · 2023-01-08T22:45:35Z

rllib/core/rl_trainer/rl_trainer.py

@@ -353,3 +406,48 @@ def _make_distributed(self) -> MultiAgentRLModule:
            The distributed module.
        """
        raise NotImplementedError
+
+    def __get_param_ref(self, param: ParamType) -> Hashable:


Can we move this function to the subclass?

I get that they'll be seldom useful to users who are implementing, but I think that we can agree that logic belonging to the torch rl trainer should belong to the torch rl trainer, and the same for tensorflow.

indexing into and adding to self._params and self._params_to_optim is really easy to do if you have this function.

Oh ok, fair. 👍👍 so you want them to be just private. Not super private.

ok this is also done.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

avnishn

I'd be interested to see an example of setting separate optimizers for the modules in the BCTfRLModule, using the new functions you added. Maybe we can try that out in the near future.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gjoliver

have some minor comments. you can decide whether they are useful.

gjoliver · 2023-01-10T14:15:34Z

rllib/core/rl_trainer/rl_trainer.py

+        """
+
+    @abc.abstractmethod
+    def _get_parameters(self, module: RLModule) -> Sequence[ParamType]:


consider make this a public API?

Yep. That sounds reasonable.

gjoliver · 2023-01-10T14:25:11Z

rllib/core/rl_trainer/rl_trainer.py

+        # optimizers and adding or removing modules.
+        self._optim_to_param: Dict[Optimizer, List[ParamRef]] = {}
+        self._param_to_optim: Dict[ParamRef, Optimizer] = {}
+        self._params: Dict[ParamRef, ParamType] = {}


I would create a named_tuple for (ParamType, Optimizer).
btw, why do you need to know the mapping from param to optimizer?
sometimes book keeping too much stuff is error prone. for example, since there shouldn't be too many params, keeping a flat list then for loop it is often an acceptable option.

The reason is that tensorflow's optimizers don't keep track of their parameters at the time of construction. You need to call optim.apply(gradient, param) for the first time to register params to an optim object. It's a tf induced design requirement. O.W. Just having the optimizer objects would suffice in torch.

…31511) moved rl_optimizer logic into rl_trainer Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added 15 commits January 5, 2023 19:05

moved rl_optimizer logic into rl_trainer

d8571fe

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

d7a6a24

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

888b226

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e518e15

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

1. added in_test to RLTrainer to allow doing test-specific stuff

f5416b1

2. multi-gpus tests pass now Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

moved the dataset reader logic into a test_util method

6286208

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

all multi-gpu unittests are now passing

4583da1

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

updated docstrings

05dfe6c

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

docstrings

705c6ea

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

6ba746c

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

removed optimizers from the ci test suit

2625430

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

revert rllib prefix

131d4c9

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

skipping rl_optimizer tests

7aed1ca

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'master' into rltrainer-is-all-you-need

5f8474f

fixes recreation of optimizers when add_module() is used

2dec0d7

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and krfricke as code owners January 7, 2023 01:15

kouroshHakha added 3 commits January 8, 2023 13:39

fixed the recreation issue, both unittests pass now

9a201c8

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

a little clean up

9173668

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

f81f8a8

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

avnishn reviewed Jan 8, 2023

View reviewed changes

kouroshHakha added 2 commits January 8, 2023 15:54

Canceled auto-inference of optimizer class when add_module is called

3fc37be

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

moved the framework specific stuff from the baseclass to subclasses

2f9f392

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha assigned sven1977 and gjoliver Jan 9, 2023

avnishn approved these changes Jan 9, 2023

View reviewed changes

kouroshHakha added 2 commits January 9, 2023 14:03

Merge branch 'master' into rltrainer-is-all-you-need

9dce11f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

removed rl_optimizer stuff

0afaf88

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha mentioned this pull request Jan 10, 2023

[RLlib] RLTrainer stand-alone unittests #31552

Merged

7 tasks

gjoliver approved these changes Jan 10, 2023

View reviewed changes

gjoliver merged commit 2cb9ace into ray-project:master Jan 10, 2023

AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023

[RLlib] Fixes the recreation of optimizers when add_module is used (#…

6caada5

…31511) moved rl_optimizer logic into rl_trainer Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fixes the recreation of optimizers when `add_module` is used #31511

[RLlib] Fixes the recreation of optimizers when `add_module` is used #31511

kouroshHakha commented Jan 7, 2023

avnishn left a comment

avnishn Jan 8, 2023

kouroshHakha Jan 8, 2023

kouroshHakha Jan 9, 2023

avnishn Jan 8, 2023

kouroshHakha Jan 8, 2023

avnishn Jan 8, 2023

kouroshHakha Jan 9, 2023

avnishn left a comment

gjoliver left a comment

gjoliver Jan 10, 2023

kouroshHakha Jan 10, 2023

gjoliver Jan 10, 2023

kouroshHakha Jan 10, 2023

[RLlib] Fixes the recreation of optimizers when add_module is used #31511

[RLlib] Fixes the recreation of optimizers when add_module is used #31511

Conversation

kouroshHakha commented Jan 7, 2023

Why are these changes needed?

Related issue number

Checks

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn left a comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] Fixes the recreation of optimizers when `add_module` is used #31511

[RLlib] Fixes the recreation of optimizers when `add_module` is used #31511