Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fixes the recreation of optimizers when add_module is used #31511

Merged
merged 22 commits into from
Jan 10, 2023

Conversation

kouroshHakha
Copy link
Contributor

Why are these changes needed?

In RLTrainer when we called add_module we used to re-create the optimizer object which would have wiped out the state of the modules that already exist under the trainer. This PR solves that.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2. multi-gpus tests pass now

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments about the placement of functions. Maybe we don't absolutely need to enforce private functions by using double underscore?

) -> None:
"""Add a module to the trainer.

Args:
module_id: The id of the module to add.
module_cls: The module class to add.
module_kwargs: The config for the module.
set_optimizer_fn: A function that takes in the module and returns a list of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a complex behavior to be finding the first added optimizer in the optimizer_to_param dictionary. We should either have the user explicitly define a default for us, or pass one themselves.

This is going to create a lot of cognitive load for the user in figuring out which optimizer class was used to create the agent. Of course unless the optimizer that was used is the same for all the modules, which is very likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any automated inference here would put a cognitive load on the user, and we need to clarify that in the docstring. I think being explicit is good here as well. Just asking the user to say what optimizer they want to use for the new added module. That's also easy to specify. I am gonna go with being fully explicit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fixed this by introducing an extra optional parameter called optimizer_cls. Users have to either provide optimizer_cls for the new parameters or define a function that returns both optimizers and their corresponding parameters with more flexibility.

@@ -353,3 +406,48 @@ def _make_distributed(self) -> MultiAgentRLModule:
The distributed module.
"""
raise NotImplementedError

def __get_param_ref(self, param: ParamType) -> Hashable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this function to the subclass?

I get that they'll be seldom useful to users who are implementing, but I think that we can agree that logic belonging to the torch rl trainer should belong to the torch rl trainer, and the same for tensorflow.

indexing into and adding to self._params and self._params_to_optim is really easy to do if you have this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, fair. 👍👍 so you want them to be just private. Not super private.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep yep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is also done.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested to see an example of setting separate optimizers for the modules in the BCTfRLModule, using the new functions you added. Maybe we can try that out in the near future.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have some minor comments. you can decide whether they are useful.

"""

@abc.abstractmethod
def _get_parameters(self, module: RLModule) -> Sequence[ParamType]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider make this a public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. That sounds reasonable.

# optimizers and adding or removing modules.
self._optim_to_param: Dict[Optimizer, List[ParamRef]] = {}
self._param_to_optim: Dict[ParamRef, Optimizer] = {}
self._params: Dict[ParamRef, ParamType] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create a named_tuple for (ParamType, Optimizer).
btw, why do you need to know the mapping from param to optimizer?
sometimes book keeping too much stuff is error prone. for example, since there shouldn't be too many params, keeping a flat list then for loop it is often an acceptable option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that tensorflow's optimizers don't keep track of their parameters at the time of construction. You need to call optim.apply(gradient, param) for the first time to register params to an optim object. It's a tf induced design requirement. O.W. Just having the optimizer objects would suffice in torch.

@gjoliver gjoliver merged commit 2cb9ace into ray-project:master Jan 10, 2023
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
…31511)

moved rl_optimizer logic into rl_trainer

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants