-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fixes the recreation of optimizers when add_module
is used
#31511
[RLlib] Fixes the recreation of optimizers when add_module
is used
#31511
Conversation
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2. multi-gpus tests pass now Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments about the placement of functions. Maybe we don't absolutely need to enforce private functions by using double underscore?
) -> None: | ||
"""Add a module to the trainer. | ||
|
||
Args: | ||
module_id: The id of the module to add. | ||
module_cls: The module class to add. | ||
module_kwargs: The config for the module. | ||
set_optimizer_fn: A function that takes in the module and returns a list of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a complex behavior to be finding the first added optimizer in the optimizer_to_param dictionary. We should either have the user explicitly define a default for us, or pass one themselves.
This is going to create a lot of cognitive load for the user in figuring out which optimizer class was used to create the agent. Of course unless the optimizer that was used is the same for all the modules, which is very likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any automated inference here would put a cognitive load on the user, and we need to clarify that in the docstring. I think being explicit is good here as well. Just asking the user to say what optimizer they want to use for the new added module. That's also easy to specify. I am gonna go with being fully explicit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fixed this by introducing an extra optional parameter called optimizer_cls. Users have to either provide optimizer_cls for the new parameters or define a function that returns both optimizers and their corresponding parameters with more flexibility.
rllib/core/rl_trainer/rl_trainer.py
Outdated
@@ -353,3 +406,48 @@ def _make_distributed(self) -> MultiAgentRLModule: | |||
The distributed module. | |||
""" | |||
raise NotImplementedError | |||
|
|||
def __get_param_ref(self, param: ParamType) -> Hashable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this function to the subclass?
I get that they'll be seldom useful to users who are implementing, but I think that we can agree that logic belonging to the torch rl trainer should belong to the torch rl trainer, and the same for tensorflow.
indexing into and adding to self._params and self._params_to_optim is really easy to do if you have this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, fair. 👍👍 so you want them to be just private. Not super private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is also done.
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to see an example of setting separate optimizers for the modules in the BCTfRLModule, using the new functions you added. Maybe we can try that out in the near future.
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have some minor comments. you can decide whether they are useful.
""" | ||
|
||
@abc.abstractmethod | ||
def _get_parameters(self, module: RLModule) -> Sequence[ParamType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider make this a public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. That sounds reasonable.
# optimizers and adding or removing modules. | ||
self._optim_to_param: Dict[Optimizer, List[ParamRef]] = {} | ||
self._param_to_optim: Dict[ParamRef, Optimizer] = {} | ||
self._params: Dict[ParamRef, ParamType] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create a named_tuple for (ParamType, Optimizer).
btw, why do you need to know the mapping from param to optimizer?
sometimes book keeping too much stuff is error prone. for example, since there shouldn't be too many params, keeping a flat list then for loop it is often an acceptable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that tensorflow's optimizers don't keep track of their parameters at the time of construction. You need to call optim.apply(gradient, param) for the first time to register params to an optim object. It's a tf induced design requirement. O.W. Just having the optimizer objects would suffice in torch.
…31511) moved rl_optimizer logic into rl_trainer Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Why are these changes needed?
In RLTrainer when we called
add_module
we used to re-create the optimizer object which would have wiped out the state of the modules that already exist under the trainer. This PR solves that.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.