-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] testing for moving precision plugin as property of TrainingTypePlugin #7805
[draft] testing for moving precision plugin as property of TrainingTypePlugin #7805
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
…oint_consolidate Update test_all_gather_grad.py
This reverts commit 9d4a2b8.
This reverts commit 0d23d75.
This reverts commit 70fe5da.
This reverts commit a9aae99.
This reverts commit ea74906.
This reverts commit bf70e43.
This reverts commit f172101.
This reverts commit 536c132.
This reverts commit 3a9fde9.
This reverts commit 7a369f4.
This reverts commit 8222dc9.
This reverts commit 6c095b2.
This reverts commit 250d0aa.
This reverts commit 8651d54.
This reverts commit dcdcd29.
Hello @shuyingsunshine21! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-06-22 13:53:43 UTC |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
All in all I am not sure, if I really like this. I see the convenience having this, but i also think this leads in the wrong direction (back to monolithic class structures) and entangles a lot of the logic we want to be separated from other parts again.
This includes (but isn't limited to)
- no more strong separation of training and precision plugin (yes I know they're still separate classes but you don't need a precision plugin anymore; you can do all in the training type plugin (and out of convenience and lazyness people will do that))
- Users have to care about both plugins when only implementing the training type (at least need to call hooks from precision)
- The accelerator as an orchestration class becomes basically useless.
So in total this feels to me like we are heading back to fixed combinations of training and precision plugins (partly even incorporating the accelerator) with no clear responsibility anymore.
@@ -201,7 +199,7 @@ def training_step( | |||
""" | |||
step_kwargs = self.to_device(step_kwargs) | |||
|
|||
with self.precision_plugin.train_step_context(), self.training_type_plugin.train_step_context(): | |||
with self.training_type_plugin.precision_plugin.train_step_context(), self.training_type_plugin.train_step_context(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We basically have two different contexts here: One for forward + backward (current train_step_context) and one for forward only.
I think we could change it down to these two as well in this PR. From my perspective there is no difference between forward in training and forward in val/test/predict so we should be able to combine them.
training_type_plugin: TrainingTypePlugin, | ||
) -> None: | ||
""" | ||
Args: | ||
precision_plugin: the plugin to handle precision-specific parts | ||
training_type_plugin: the plugin to handle different training routines | ||
""" | ||
self.precision_plugin = precision_plugin | ||
self.precision_plugin = training_type_plugin.precision_plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we even need a fixed reference here then? I think replacing it with a property might be better since that does not increase the reference counter of the object
self, optimizer: Optimizer, optimizer_idx: int, lambda_closure: Callable, **kwargs: Any | ||
) -> None: | ||
self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs) | ||
self.training_type_plugin.optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only downside here is that the training type plugin must make sure to call the precision plugin here. This is something the user has to make sure when implementing his own training type plugin
@@ -28,7 +28,7 @@ def global_rank(self) -> int: | |||
def world_size(self) -> int: | |||
return self.num_nodes | |||
|
|||
def setup(self, model): | |||
def setup_model(self, model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not rename these, since they aren't meant to setup only the model. They can actually setup arbitrary things
@@ -54,9 +54,9 @@ def backward( | |||
self, | |||
model: 'pl.LightningModule', | |||
closure_loss: Tensor, | |||
should_accumulate: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but i think the order should be preserved here to avoid BC for positional args
@@ -54,7 +67,7 @@ def _reinit_optimizers_with_oss(self): | |||
optim_class = type(optimizer) | |||
zero_optimizer = OSS(params=optimizer.param_groups, optim=optim_class, **optimizer.defaults) | |||
if _FAIRSCALE_OSS_FP16_BROADCAST_AVAILABLE: | |||
is_fp16 = self.lightning_module.trainer.precision == 16 | |||
is_fp16 = self.lightning_module.trainer.precision == "mixed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't that properly handled before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, unfortunately, will separate this small fix.
@@ -66,9 +66,8 @@ def model_to_device(self) -> None: | |||
|
|||
self._model.to(self.root_device) | |||
|
|||
def setup(self, model: torch.nn.Module) -> torch.nn.Module: | |||
def setup_model(self, model: 'pl.LightningModule') -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -111,9 +111,8 @@ def pre_dispatch(self): | |||
if self.debug: | |||
os.environ["PT_XLA_DEBUG"] = str(1) | |||
|
|||
def setup(self, model: Module) -> Module: | |||
def setup_model(self, model: 'pl.LightningModule') -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -55,7 +55,7 @@ def distributed_sampler_kwargs(self): | |||
distributed_sampler_kwargs = dict(num_replicas=self.world_size, rank=self.global_rank) | |||
return distributed_sampler_kwargs | |||
|
|||
def setup(self, model): | |||
def setup_model(self, model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
return self._precision_plugin | ||
|
||
@precision_plugin.setter | ||
def precision_plugin(self, args: Dict[str, Any]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, I don't like this. This should live in the acceleator connector. You perform the usual checks there and check if this class implements some other logic, but ideally they'd be determined beforehand (maybe as a staticmethod since you don't seem to really need the sate for selection)
Thanks @justusschock for the comments! I am also thinking whether going back to one class like
I also feel that it is a bit vague to separate the current training type plugin and precision plugin. One can actually do everything in training type plugin and has some dummy precision plugin even for the current way. If we would like to have clear separation, what would be a good interface to ensure that.
I am thinking this is actually intended, as this is to avoid implementing one of them without caring logics of the other, causing unexpected behavior. Current way, user need to care about this in accelerator level though. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
This pull request is going to be closed. Please feel free to reopen it create a new from the actual master. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
What does this PR do?
Fixes #7324
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃