[draft] testing for moving precision plugin as property of TrainingTypePlugin #7805

shuyingsunshine21 · 2021-06-02T09:05:59Z

What does this PR do?

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

…lightning pull latest code

…oint_consolidate Update test_all_gather_grad.py

This reverts commit 9d4a2b8.

…1-checkpoint_consolidate" This reverts commit c5053da, reversing changes made to 0d23d75.

This reverts commit 0d23d75.

This reverts commit 70fe5da.

This reverts commit a9aae99.

This reverts commit ea74906.

This reverts commit bf70e43.

This reverts commit f172101.

This reverts commit 536c132.

This reverts commit 3a9fde9.

This reverts commit 7a369f4.

…lightning

This reverts commit 8222dc9.

This reverts commit 6c095b2.

This reverts commit 250d0aa.

This reverts commit 8651d54.

This reverts commit dcdcd29.

…-lightning

pep8speaks · 2021-06-02T09:27:06Z

Hello @shuyingsunshine21! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-06-22 13:53:43 UTC

for more information, see https://pre-commit.ci

justusschock

Some initial comments.

All in all I am not sure, if I really like this. I see the convenience having this, but i also think this leads in the wrong direction (back to monolithic class structures) and entangles a lot of the logic we want to be separated from other parts again.

This includes (but isn't limited to)

no more strong separation of training and precision plugin (yes I know they're still separate classes but you don't need a precision plugin anymore; you can do all in the training type plugin (and out of convenience and lazyness people will do that))
Users have to care about both plugins when only implementing the training type (at least need to call hooks from precision)
The accelerator as an orchestration class becomes basically useless.

So in total this feels to me like we are heading back to fixed combinations of training and precision plugins (partly even incorporating the accelerator) with no clear responsibility anymore.

justusschock · 2021-06-02T09:21:22Z

pytorch_lightning/accelerators/accelerator.py

@@ -201,7 +199,7 @@ def training_step(
        """
        step_kwargs = self.to_device(step_kwargs)

-        with self.precision_plugin.train_step_context(), self.training_type_plugin.train_step_context():
+        with self.training_type_plugin.precision_plugin.train_step_context(), self.training_type_plugin.train_step_context():


We basically have two different contexts here: One for forward + backward (current train_step_context) and one for forward only.

I think we could change it down to these two as well in this PR. From my perspective there is no difference between forward in training and forward in val/test/predict so we should be able to combine them.

justusschock · 2021-06-02T09:22:07Z

pytorch_lightning/accelerators/accelerator.py

        training_type_plugin: TrainingTypePlugin,
    ) -> None:
        """
        Args:
            precision_plugin: the plugin to handle precision-specific parts
            training_type_plugin: the plugin to handle different training routines
        """
-        self.precision_plugin = precision_plugin
+        self.precision_plugin = training_type_plugin.precision_plugin


do we even need a fixed reference here then? I think replacing it with a property might be better since that does not increase the reference counter of the object

justusschock · 2021-06-02T09:23:12Z

pytorch_lightning/accelerators/accelerator.py

-        self, optimizer: Optimizer, optimizer_idx: int, lambda_closure: Callable, **kwargs: Any
-    ) -> None:
-        self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
+        self.training_type_plugin.optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)


The only downside here is that the training type plugin must make sure to call the precision plugin here. This is something the user has to make sure when implementing his own training type plugin

justusschock · 2021-06-02T09:25:05Z

pytorch_lightning/plugins/training_type/ddp2.py

@@ -28,7 +28,7 @@ def global_rank(self) -> int:
    def world_size(self) -> int:
        return self.num_nodes

-    def setup(self, model):
+    def setup_model(self, model):


I'd rather not rename these, since they aren't meant to setup only the model. They can actually setup arbitrary things

justusschock · 2021-06-02T09:25:41Z

pytorch_lightning/plugins/precision/precision_plugin.py

@@ -54,9 +54,9 @@ def backward(
        self,
        model: 'pl.LightningModule',
        closure_loss: Tensor,
+        should_accumulate: bool,


Not sure, but i think the order should be preserved here to avoid BC for positional args

justusschock · 2021-06-02T09:27:00Z

pytorch_lightning/plugins/training_type/sharded.py

@@ -54,7 +67,7 @@ def _reinit_optimizers_with_oss(self):
                optim_class = type(optimizer)
                zero_optimizer = OSS(params=optimizer.param_groups, optim=optim_class, **optimizer.defaults)
                if _FAIRSCALE_OSS_FP16_BROADCAST_AVAILABLE:
-                    is_fp16 = self.lightning_module.trainer.precision == 16
+                    is_fp16 = self.lightning_module.trainer.precision == "mixed"


wasn't that properly handled before?

yeah, unfortunately, will separate this small fix.

justusschock · 2021-06-02T09:27:18Z

pytorch_lightning/plugins/training_type/single_device.py

@@ -66,9 +66,8 @@ def model_to_device(self) -> None:

        self._model.to(self.root_device)

-    def setup(self, model: torch.nn.Module) -> torch.nn.Module:
+    def setup_model(self, model: 'pl.LightningModule') -> None:


same as above

justusschock · 2021-06-02T09:27:32Z

pytorch_lightning/plugins/training_type/tpu_spawn.py

@@ -111,9 +111,8 @@ def pre_dispatch(self):
        if self.debug:
            os.environ["PT_XLA_DEBUG"] = str(1)

-    def setup(self, model: Module) -> Module:
+    def setup_model(self, model: 'pl.LightningModule') -> None:


same as above

justusschock · 2021-06-02T09:27:59Z

pytorch_lightning/plugins/training_type/horovod.py

@@ -55,7 +55,7 @@ def distributed_sampler_kwargs(self):
        distributed_sampler_kwargs = dict(num_replicas=self.world_size, rank=self.global_rank)
        return distributed_sampler_kwargs

-    def setup(self, model):
+    def setup_model(self, model):


same as above

justusschock · 2021-06-02T09:31:29Z

pytorch_lightning/plugins/training_type/training_type_plugin.py

+        return self._precision_plugin
+
+    @precision_plugin.setter
+    def precision_plugin(self, args: Dict[str, Any]) -> None:


tbh, I don't like this. This should live in the acceleator connector. You perform the usual checks there and check if this class implements some other logic, but ideally they'd be determined beforehand (maybe as a staticmethod since you don't seem to really need the sate for selection)

shuyingsunshine21 · 2021-06-03T02:23:39Z

Thanks @justusschock for the comments!

I am also thinking whether going back to one class like TrainingStrategyPlugin to control everything is a bad thing or not given that interleaving calls across TrainingType and Precision handled by accelerator might be problematic. This flow might cause some combination of TrainingType and Precision do not work as expected.

no more strong separation of training and precision plugin (yes I know they're still separate classes but you don't need a precision plugin anymore; you can do all in the training type plugin (and out of convenience and lazyness people will do that))

I also feel that it is a bit vague to separate the current training type plugin and precision plugin. One can actually do everything in training type plugin and has some dummy precision plugin even for the current way. If we would like to have clear separation, what would be a good interface to ensure that.

Users have to care about both plugins when only implementing the training type (at least need to call hooks from precision)

I am thinking this is actually intended, as this is to avoid implementing one of them without caring logics of the other, causing unexpected behavior. Current way, user need to care about this in accelerator level though.

stale · 2021-06-17T12:12:57Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions.

stale · 2021-06-22T13:53:13Z

This pull request is going to be closed. Please feel free to reopen it create a new from the actual master.

stale · 2021-07-06T18:27:56Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions.

stale · 2021-07-22T22:52:27Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions.

Shuying Sun and others added 30 commits March 23, 2021 12:06

Fix some test errors

89f284d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch 'master' of https://github.com/PyTorchLightning/pytorch-…

80cfbff

…lightning pull latest code

checkpoint consolidation

536c132

Update ddp_spawn.py

f172101

Update test_metric_result_integration.py

bf70e43

Update test_results.py

ea74906

Update utils.py

a9aae99

Update utils.py

70fe5da

Update test_all_gather_grad.py

0d23d75

Update test_all_gather_grad.py

ca6f98b

Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkp…

c5053da

…oint_consolidate Update test_all_gather_grad.py

Update test_results.py

9d4a2b8

Revert "Update test_results.py"

7635b4f

This reverts commit 9d4a2b8.

Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine2…

d64f90c

…1-checkpoint_consolidate" This reverts commit c5053da, reversing changes made to 0d23d75.

Revert "Update test_all_gather_grad.py"

dcdcd29

This reverts commit 0d23d75.

Revert "Update utils.py"

8651d54

This reverts commit 70fe5da.

Revert "Update utils.py"

15f4b9e

This reverts commit a9aae99.

Revert "Update test_results.py"

250d0aa

This reverts commit ea74906.

Revert "Update test_metric_result_integration.py"

6c095b2

This reverts commit bf70e43.

Revert "Update ddp_spawn.py"

8222dc9

This reverts commit f172101.

Revert "checkpoint consolidation"

3a9fde9

This reverts commit 536c132.

Revert "Revert "checkpoint consolidation""

7a369f4

This reverts commit 3a9fde9.

Revert "Revert "Revert "checkpoint consolidation"""

b4a0b9e

This reverts commit 7a369f4.

Merge branch 'master' of https://github.com/PyTorchLightning/pytorch-…

5cf1db1

…lightning

Revert "Revert "Update ddp_spawn.py""

0ce7e05

This reverts commit 8222dc9.

Revert "Revert "Update test_metric_result_integration.py""

fe9736d

This reverts commit 6c095b2.

Revert "Revert "Update test_results.py""

c314ef6

This reverts commit 250d0aa.

Revert "Revert "Update utils.py""

c3feda0

This reverts commit 8651d54.

Revert "Revert "Update test_all_gather_grad.py""

c759477

This reverts commit dcdcd29.

Merge branch 'master' of https://github.com/shuyingsunshine21/pytorch…

7a8e540

…-lightning

Shuying Sun added 3 commits June 1, 2021 11:06

initial

5bab62b

v1

1fb2db2

add some remark

0d03f0b

shuyingsunshine21 requested review from awaelchli, Borda, carmocca, justusschock, kaushikb11, SeanNaren, tchaton and williamFalcon as code owners June 2, 2021 09:06

shuyingsunshine21 marked this pull request as draft June 2, 2021 09:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

aa10942

for more information, see https://pre-commit.ci

justusschock reviewed Jun 2, 2021

View reviewed changes

stale bot added the won't fix This will not be worked on label Jun 17, 2021

stale bot closed this Jun 22, 2021

kaushikb11 reopened this Jun 22, 2021

stale bot removed the won't fix This will not be worked on label Jun 22, 2021

stale bot added the won't fix This will not be worked on label Jul 6, 2021

edenlightning removed the won't fix This will not be worked on label Jul 8, 2021

stale bot added the won't fix This will not be worked on label Jul 22, 2021

awaelchli added this to the v1.5 milestone Jul 22, 2021

stale bot removed the won't fix This will not be worked on label Jul 22, 2021

awaelchli closed this Nov 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] testing for moving precision plugin as property of TrainingTypePlugin #7805

[draft] testing for moving precision plugin as property of TrainingTypePlugin #7805

shuyingsunshine21 commented Jun 2, 2021 •

edited

Loading

pep8speaks commented Jun 2, 2021 •

edited

Loading

justusschock left a comment

justusschock Jun 2, 2021

justusschock Jun 2, 2021

justusschock Jun 2, 2021

justusschock Jun 2, 2021

justusschock Jun 2, 2021

justusschock Jun 2, 2021

shuyingsunshine21 Jun 3, 2021

justusschock Jun 2, 2021

justusschock Jun 2, 2021

justusschock Jun 2, 2021

justusschock Jun 2, 2021

shuyingsunshine21 commented Jun 3, 2021

stale bot commented Jun 17, 2021

stale bot commented Jun 22, 2021

stale bot commented Jul 6, 2021

stale bot commented Jul 22, 2021

[draft] testing for moving precision plugin as property of TrainingTypePlugin #7805

[draft] testing for moving precision plugin as property of TrainingTypePlugin #7805

Conversation

shuyingsunshine21 commented Jun 2, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

pep8speaks commented Jun 2, 2021 • edited Loading

Comment last updated at 2021-06-22 13:53:43 UTC

justusschock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shuyingsunshine21 commented Jun 3, 2021

stale bot commented Jun 17, 2021

stale bot commented Jun 22, 2021

stale bot commented Jul 6, 2021

stale bot commented Jul 22, 2021

shuyingsunshine21 commented Jun 2, 2021 •

edited

Loading

pep8speaks commented Jun 2, 2021 •

edited

Loading