-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-design call_hook
interface
#10575
Conversation
for more information, see https://pre-commit.ci
[RFC] @daniellepintz @awaelchli what's our plan for TraininerOptimizerMixin? TraininerOptimizerMixin is basically unrelated to trainer, the setup optimizer and lr logic can stay in Strategy. Strategy is the only place using setup optimizer and lr logic and it owns optimizers and lr. |
@four4fish |
@four4fish the plan for TrainerOptimizerMixin is to move all of its logic to Strategy, and remove it. I don't think we need to separate |
@daniellepintz after the refactor (include simplify spawning logic ), yeah i think so. Now we need pass trainer to setup, setup_optizer because call_hook in TrainerOptimizerMixin. Otherwise we only need trainer.fn.state. |
for more information, see https://pre-commit.ci
There are 12 failing tests remaining in the CI, and they are all passing locally for me so I am quite confused:
|
The CI is failing with this
was fairscale removed as a dependency? |
no, I didn't change anything with fairscale. maybe @awaelchli knows something since he has been working with DDP recently? |
Happy to help. Take the example of this test that fails:
This all occurs because this PR replaced My suggestion for resolution:
|
@awaelchli thank you SO much for your help! That makes total sense. Fixed!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic!
fn = getattr(self.accelerator, hook_name) | ||
if not callable(fn): | ||
return None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed not setting the current_fx_name
for accelerator but since we do it manually in all places that call this method, might as well put it inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I ended up having to change it because some tests were failing. Also at least this way we don't use the LM's trainer reference in TTP.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some mind boggling problems in #10890. One question, why do these new call_xyz methods not reset the current_fx_name to its previous value? Was this forgotten or intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with @carmocca I learned that only LM hooks and callback hooks need to set current_fx_name, so I only included that in the _call_LM_hook
and _call_callback_hooks
, and not for _call_ttp_hook
and _call_accelerator_hook
. However, before this PR, all the places where we called the accelerator hook had a line like this:
https://github.com/PyTorchLightning/pytorch-lightning/blob/3d6262b7a91215e72019d720e742e6261e1636dc/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py#L220
so I included self.lightning_module._current_fx_name = hook_name
in _call_accelerator_hook
and got rid of the individual lines like the above
Codecov Report
@@ Coverage Diff @@
## master #10575 +/- ##
========================================
- Coverage 92% 88% -4%
========================================
Files 177 177
Lines 16553 16484 -69
========================================
- Hits 15204 14522 -682
- Misses 1349 1962 +613 |
Merged, but @carmocca if you have any follow ups I can add them in subsequent PRs |
Just opened #10957 with what I think was missing, @awaelchli, for your PR, you'll have to set and unset in |
What does this PR do?
Fixes #8506
Replaces
call_hook
with 4 methods:call_callback_hooks
,call_lightning_module_hook
,call_accelerator_hook
andcall_ttp_hook
. The last two can hopefully be removed at some point in the future.Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃