Move strategy-specific dataloader logic to the stategies #11756

ananthsub · 2022-02-05T07:04:54Z

Proposed refactor

Motivation

Strategies today interact with dataloading, especially in distributed training. It makes sense for the strategy to directly handle this logic.

This would reduce and simplify interactions elsewhere in the trainer, in particular strategy state -> trainer properties -> data connector logic

And would remove the hacky patching that IPUs do right now 😧 https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/ipu.py#L125-L130

Pitch

The Strategy interface already offers a process_dataloader method:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/strategy.py#L369-L375

However, there's a ton of strategy-specific logic written in the trainer's data connector:

For example, warnings with DDP spawn:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L278-L312

generic multi-processing warning:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L314-L322

the is_distributed flag is primarily used here within the trainer: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L324-L330

Distributed sampler kwargs is strategy specific:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/trainer.py#L2126-L2129

IPU check: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L364

In my opinion we could simplify this by moving relevant logic into strategy.process_dataloader instead. Common logic can still be abstracted out into utility functions to share across different strategy classes.

We could either:

augment process_dataloader to acept more metadata, such as the trainer's running stage and if this is a dataloader being used for training/validation/test/predict

def process_dataloader(self, dataloader, use: str) -> Union[dataloader, iterable]:

or offer multiple APIs that map to the DataLoader hooks: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/core/hooks.py#L406

def process_train_dataloader(self, dataloader):
def process_val_dataloader(self, dataloader):
def process_test_dataloader(self, dataloader):
def process_predict_dataloader(self, dataloader):

Over time, the Trainer flag replace_sampler_ddp makes much more sense on the specific distributed strategy constructors instead of on the trainer.

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @rohitgr7 @ninginthecloud @tchaton @Borda

The text was updated successfully, but these errors were encountered:

awaelchli · 2022-02-05T13:36:54Z

This is a great concept and I feel we should definitely explore it. However, I see some challenges to put everything in this single method, because the logic is spread out and interleaved with other non-strategy related functions. To get to a proof-of-concept, one could attempt an intermediate refactor to sequence out the strategy-related code and merge them into a single method inside the connector, to find what the blockers are.

It would also be great if we could make it trainer-reference-free to enable using the hook also in Lite. In this sense, I would first explore the generic def process_dataloader(self, dataloader, use: str) -> Union[dataloader, iterable]: instead of the prefixed versions.

ananthsub added refactor data handling Generic data-related topic strategy labels Feb 5, 2022

ananthsub changed the title ~~Move strategy-specific dataloader logic to the stategy class~~ Move strategy-specific dataloader logic to the stategies Feb 5, 2022

ananthsub added the design Includes a design discussion label Feb 5, 2022

ananthsub mentioned this issue Feb 8, 2022

#3325 #11810

Closed

12 tasks

ananthsub mentioned this issue Feb 18, 2022

Add is_distributed to Strategy API and remove accl_conn is_distributed property #11968

Closed

12 tasks

carmocca added this to the 1.7 milestone Feb 18, 2022

carmocca modified the milestones: pl:1.7, pl:future Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move strategy-specific dataloader logic to the stategies #11756

Move strategy-specific dataloader logic to the stategies #11756

ananthsub commented Feb 5, 2022 •

edited by github-actions bot

Loading

awaelchli commented Feb 5, 2022 •

edited

Loading

Move strategy-specific dataloader logic to the stategies #11756

Move strategy-specific dataloader logic to the stategies #11756

Comments

ananthsub commented Feb 5, 2022 • edited by github-actions bot Loading

Proposed refactor

Motivation

Pitch

Additional context

If you enjoy Lightning, check out our other projects! ⚡

awaelchli commented Feb 5, 2022 • edited Loading

ananthsub commented Feb 5, 2022 •

edited by github-actions bot

Loading

awaelchli commented Feb 5, 2022 •

edited

Loading