You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my opinion we could simplify this by moving relevant logic into strategy.process_dataloader instead. Common logic can still be abstracted out into utility functions to share across different strategy classes.
We could either:
augment process_dataloader to acept more metadata, such as the trainer's running stage and if this is a dataloader being used for training/validation/test/predict
Over time, the Trainer flag replace_sampler_ddp makes much more sense on the specific distributed strategy constructors instead of on the trainer.
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
ananthsub
changed the title
Move strategy-specific dataloader logic to the stategy class
Move strategy-specific dataloader logic to the stategies
Feb 5, 2022
This is a great concept and I feel we should definitely explore it. However, I see some challenges to put everything in this single method, because the logic is spread out and interleaved with other non-strategy related functions. To get to a proof-of-concept, one could attempt an intermediate refactor to sequence out the strategy-related code and merge them into a single method inside the connector, to find what the blockers are.
It would also be great if we could make it trainer-reference-free to enable using the hook also in Lite. In this sense, I would first explore the generic def process_dataloader(self, dataloader, use: str) -> Union[dataloader, iterable]: instead of the prefixed versions.
Proposed refactor
Motivation
Strategies today interact with dataloading, especially in distributed training. It makes sense for the strategy to directly handle this logic.
This would reduce and simplify interactions elsewhere in the trainer, in particular strategy state -> trainer properties -> data connector logic
And would remove the hacky patching that IPUs do right now 😧 https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/ipu.py#L125-L130
Pitch
The Strategy interface already offers a
process_dataloader
method:https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/strategy.py#L369-L375
However, there's a ton of strategy-specific logic written in the trainer's data connector:
For example, warnings with DDP spawn:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L278-L312
generic multi-processing warning:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L314-L322
the is_distributed flag is primarily used here within the trainer: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L324-L330
Distributed sampler kwargs is strategy specific:
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/trainer.py#L2126-L2129
IPU check: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L364
In my opinion we could simplify this by moving relevant logic into
strategy.process_dataloader
instead. Common logic can still be abstracted out into utility functions to share across different strategy classes.We could either:
process_dataloader
to acept more metadata, such as the trainer's running stage and if this is a dataloader being used for training/validation/test/predictor offer multiple APIs that map to the DataLoader hooks: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/core/hooks.py#L406
Over time, the Trainer flag
replace_sampler_ddp
makes much more sense on the specific distributed strategy constructors instead of on the trainer.Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @rohitgr7 @ninginthecloud @tchaton @Borda
The text was updated successfully, but these errors were encountered: