Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run big models with DDP/FSDP instead of torch.nn.DataParallel #683

Open
WenjieDu opened this issue Mar 26, 2025 · 0 comments
Open

Run big models with DDP/FSDP instead of torch.nn.DataParallel #683

WenjieDu opened this issue Mar 26, 2025 · 0 comments
Labels
discussion enhancement New feature or request help wanted Extra attention is needed new feature Proposing to add a new feature

Comments

@WenjieDu
Copy link
Owner

WenjieDu commented Mar 26, 2025

1. Feature description

Make PyPOTS run models on multi-GPU with DDP (Distributed Data Parallel, https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) or FSDP (Fully Sharded Data Parallel, https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html).

2. Motivation

Current multi-gpu training implemented with torch.nn.DataParallel in PyPOTS framework is not enough for training big models like Time-LLM (e.g. #675 Time-LLM easy OOM on short-len TS samples), we need more advanced feature like DDP or FSDP

3. Your contribution

Would like to lead or arrange the development task. Please leave comments below to start discussions if you're interested. More comments will help prioritize this feature.

@WenjieDu WenjieDu added discussion enhancement New feature or request help wanted Extra attention is needed new feature Proposing to add a new feature labels Mar 26, 2025
@WenjieDu WenjieDu pinned this issue Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request help wanted Extra attention is needed new feature Proposing to add a new feature
Projects
None yet
Development

No branches or pull requests

1 participant