[Accelerator refactor] Move Accelerator into Strategy #10648

four4fish · 2021-11-20T00:16:59Z

Proposed refactor

Part 3 of Accelerator and Plugin refactor #10416

Motivation

Moving towards stable version

After step 2 Move Precision Plugin into TTP Precision Plugins should be part of Training Type Plugins #7324
Accelerator is not the routing layer for strategy and precision anymore, optimizer related logic, steps, hooks all moved in to strategy.

Accelerator only have device information - strategy should own accelerator. Reduce the code complexity and improve code maintainability

Pitch

move accelerator into ttp/strategy as device_plugin (similar to checkpoint_io) and updating logic in accelerator-connector, training, loops accordingly

[RFC] Should we have a new name for Accelerator?

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @kaushikb11 @ananthsub

ananthsub · 2021-11-20T01:34:04Z

move accelerator into ttp/strategy as device_plugin (similar to checkpoint_io)

I would prefer keeping the Accelerator name. Other than that this looks good to me.

awaelchli · 2021-11-21T15:48:15Z

Definitely 100% the Accelerator name should stay, and also its core responsibilities. It should remain the main component to access hardware. For example, all the *_to_device methods and similar currently on the strategy should eventually call in the Accelerator to perform the action IMO. Does that make sense?

ananthsub · 2021-12-01T21:18:56Z

For example, all the *_to_device methods and similar currently on the strategy should eventually call in the Accelerator to perform the action IMO. Does that make sense?

Even today, the accelerator doesn't handle moving the batch to device since the LightningModule's hooks can customize this. Moving the model to device also sits closer to the strategy IMO than the accelerator.

I do think the accelerator implementations could be useful for storing specific properties like:

Whether to enable CUDA graphs (on the GPU/CUDA accelerator)
Whether to enable intra-device inter-batch parallelism (for example, using CUDA streams & events: https://github.com/PyTorchLightning/pytorch-lightning/blob/619ef7a665244eab2c47c892f7d821360597da9e/pytorch_lightning/utilities/fetching.py#L334)

ananthsub · 2021-12-10T18:23:48Z

@four4fish @awaelchli - Going through #11022, which component will now own the root_device, the strategy or the accelerator?

four4fish · 2021-12-10T18:26:25Z

@four4fish @awaelchli - Going through #11022, which component will now own the root_device, the strategy or the accelerator?

@ananthsub We discussed offline before, should be strategy for now as it's parallel devices and different between different strategies. for example: single device vs distributed for same accelerator but have different root_device
When we flatten the inheritance for strategies we can revisit all device related logic and move around. Detailed discussion here: https://docs.google.com/document/d/1E5t8auWf5DrNHzutvMmrJC_KqBqVyZuYX0thra69Ad8/edit#heading=h.j849ae9ljzqb

four4fish added the refactor label Nov 20, 2021

This was referenced Nov 20, 2021

[Main Issue] Accelerator and Plugin refactor #10416

Closed

1/n Move Accelerator into strategy - move batch_to_device to strategy #10649

Merged

kaushikb11 mentioned this issue Nov 26, 2021

Resolve training type plugin when passed with Accelerator #10775

Closed

This was referenced Dec 1, 2021

Remove precision_plugin pre_dispatch() method #10884

Closed

2/n Move Accelerator into strategy - remove dispatch functions from Accelerator #10885

Merged

3/n Move Accelerator into strategy - move model_sharded_context() #10886

Merged

four4fish added accelerator plugin labels Dec 1, 2021

four4fish added this to the 1.6 milestone Dec 1, 2021

four4fish linked a pull request Dec 10, 2021 that will close this issue

3/n Move accelerator into Strategy #11022

Merged

12 tasks

four4fish mentioned this issue Dec 10, 2021

3/n Move accelerator into Strategy #11022

Merged

12 tasks

four4fish closed this as completed in #11022 Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Accelerator refactor] Move Accelerator into Strategy #10648

[Accelerator refactor] Move Accelerator into Strategy #10648

four4fish commented Nov 20, 2021 •

edited by github-actions bot

Loading

ananthsub commented Nov 20, 2021

awaelchli commented Nov 21, 2021 •

edited

Loading

ananthsub commented Dec 1, 2021

ananthsub commented Dec 10, 2021

four4fish commented Dec 10, 2021 •

edited

Loading

[Accelerator refactor] Move Accelerator into Strategy #10648

[Accelerator refactor] Move Accelerator into Strategy #10648

Comments

four4fish commented Nov 20, 2021 • edited by github-actions bot Loading

Proposed refactor

Motivation

Pitch

Additional context

If you enjoy Lightning, check out our other projects! ⚡

ananthsub commented Nov 20, 2021

awaelchli commented Nov 21, 2021 • edited Loading

ananthsub commented Dec 1, 2021

ananthsub commented Dec 10, 2021

four4fish commented Dec 10, 2021 • edited Loading

four4fish commented Nov 20, 2021 •

edited by github-actions bot

Loading

awaelchli commented Nov 21, 2021 •

edited

Loading

four4fish commented Dec 10, 2021 •

edited

Loading