Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move strategy-specific dataloader logic to the stategies #11756

Open
ananthsub opened this issue Feb 5, 2022 · 1 comment
Open

Move strategy-specific dataloader logic to the stategies #11756

ananthsub opened this issue Feb 5, 2022 · 1 comment
Labels
data handling Generic data-related topic design Includes a design discussion refactor strategy
Milestone

Comments

@ananthsub
Copy link
Contributor

ananthsub commented Feb 5, 2022

Proposed refactor

Motivation

Strategies today interact with dataloading, especially in distributed training. It makes sense for the strategy to directly handle this logic.

This would reduce and simplify interactions elsewhere in the trainer, in particular strategy state -> trainer properties -> data connector logic

And would remove the hacky patching that IPUs do right now 😧 https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/ipu.py#L125-L130

Pitch

The Strategy interface already offers a process_dataloader method:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/strategy.py#L369-L375

However, there's a ton of strategy-specific logic written in the trainer's data connector:

For example, warnings with DDP spawn:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L278-L312

generic multi-processing warning:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L314-L322

the is_distributed flag is primarily used here within the trainer: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L324-L330

Distributed sampler kwargs is strategy specific:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/trainer.py#L2126-L2129

IPU check: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L364

In my opinion we could simplify this by moving relevant logic into strategy.process_dataloader instead. Common logic can still be abstracted out into utility functions to share across different strategy classes.

We could either:

  • augment process_dataloader to acept more metadata, such as the trainer's running stage and if this is a dataloader being used for training/validation/test/predict
def process_dataloader(self, dataloader, use: str) -> Union[dataloader, iterable]:

or offer multiple APIs that map to the DataLoader hooks: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/core/hooks.py#L406

def process_train_dataloader(self, dataloader):
def process_val_dataloader(self, dataloader):
def process_test_dataloader(self, dataloader):
def process_predict_dataloader(self, dataloader):

Over time, the Trainer flag replace_sampler_ddp makes much more sense on the specific distributed strategy constructors instead of on the trainer.

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @rohitgr7 @ninginthecloud @tchaton @Borda

@ananthsub ananthsub added refactor data handling Generic data-related topic strategy labels Feb 5, 2022
@ananthsub ananthsub changed the title Move strategy-specific dataloader logic to the stategy class Move strategy-specific dataloader logic to the stategies Feb 5, 2022
@ananthsub ananthsub added the design Includes a design discussion label Feb 5, 2022
@awaelchli
Copy link
Contributor

awaelchli commented Feb 5, 2022

This is a great concept and I feel we should definitely explore it. However, I see some challenges to put everything in this single method, because the logic is spread out and interleaved with other non-strategy related functions. To get to a proof-of-concept, one could attempt an intermediate refactor to sequence out the strategy-related code and merge them into a single method inside the connector, to find what the blockers are.

It would also be great if we could make it trainer-reference-free to enable using the hook also in Lite. In this sense, I would first explore the generic def process_dataloader(self, dataloader, use: str) -> Union[dataloader, iterable]: instead of the prefixed versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data handling Generic data-related topic design Includes a design discussion refactor strategy
Projects
None yet
Development

No branches or pull requests

3 participants