-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Accelerator refactor] Move Accelerator into Strategy #10648
Comments
I would prefer keeping the Accelerator name. Other than that this looks good to me. |
Definitely 100% the Accelerator name should stay, and also its core responsibilities. It should remain the main component to access hardware. For example, all the |
Even today, the accelerator doesn't handle moving the batch to device since the LightningModule's hooks can customize this. Moving the model to device also sits closer to the strategy IMO than the accelerator. I do think the accelerator implementations could be useful for storing specific properties like:
|
@four4fish @awaelchli - Going through #11022, which component will now own the |
@ananthsub We discussed offline before, should be strategy for now as it's parallel devices and different between different strategies. for example: single device vs distributed for same accelerator but have different root_device |
Proposed refactor
Part 3 of Accelerator and Plugin refactor #10416
Motivation
Moving towards stable version
After step 2 Move Precision Plugin into TTP Precision Plugins should be part of Training Type Plugins #7324
Accelerator is not the routing layer for strategy and precision anymore, optimizer related logic, steps, hooks all moved in to strategy.
Accelerator only have device information - strategy should own accelerator. Reduce the code complexity and improve code maintainability
Pitch
move accelerator into ttp/strategy as device_plugin (similar to checkpoint_io) and updating logic in accelerator-connector, training, loops accordingly
[RFC] Should we have a new name for Accelerator?
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @kaushikb11 @ananthsub
The text was updated successfully, but these errors were encountered: