-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(5/n) Support 2D Parallelism in Lightning Trainer #19878
Conversation
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflowThese checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Benchmarks
These checks are required after the changes to 🟢 fabric: Docs
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 lightning_fabric: CPU workflowThese checks are required after the changes to 🟢 lightning_fabric: Azure GPU
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to Thank you for your contribution! 💜
|
c4d3f75
to
2585763
Compare
2585763
to
d806b64
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19878 +/- ##
=========================================
- Coverage 84% 59% -25%
=========================================
Files 425 421 -4
Lines 35028 35139 +111
=========================================
- Hits 29319 20714 -8605
- Misses 5709 14425 +8716 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great
What does this PR do?
Ports the functionality added in #19846 to the Trainer. The same tests are adopted and rewritten for the Trainer semantics. Some tests were also taken from the Trainer FSDP Strategy test files.
To keep the PRs minimal, I'm not including checkpointing here and will submit it in a follow up PR.
A concrete example of how to use the strategy is in #19879. In summary:
📚 Documentation preview 📚: https://pytorch-lightning--19878.org.readthedocs.build/en/19878/
cc @Borda @carmocca @justusschock @awaelchli