Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP is not integrated across code base #10178

Closed
janEbert opened this issue Aug 16, 2024 · 2 comments
Closed

FSDP is not integrated across code base #10178

janEbert opened this issue Aug 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@janEbert
Copy link
Contributor

Since it seems like my comment in the FSDP PR for NeMo 2.0 was missed in the activities of code reviews, I wrote this issue to raise visibility for the current problems with FSDP integration in NeMo.

To summarize the original comment, FSDP support is really not well-integrated across the code base. Many scripts do not work with it because they only implement NLPDDPStrategy, checkpoint conversion was a huge undertaking to get working (PR pending), and some feature flags straight-up don't work or had bugs.

The original reason for the comment was desiring a better integration in NeMo 2.X since FSDP actually achieves better performance with default settings compared to 4D parallelism. There is also desire to help with this; however, because of past/current experiences with PRs getting ignored, my motivation to fix things in this public code without prior discussion is highly limited. I don't want to have to post redundant comments to even more PRs to prevent them from getting closed as "stale due to no activity".

@janEbert janEbert added the bug Something isn't working label Aug 16, 2024
@ericharper
Copy link
Collaborator

Sorry, we didn't reply to your comment, but we did read it.

We don't plan to make updates to FSDP for NeMo 1.0 but NeMo 2.0 FSDP support will continue to be maintained.

I'm going to go close this issue, if there's a specific bug or feature request with NeMo 2.0 + FSDP please file a new issue with details.

@janEbert
Copy link
Contributor Author

Thank you for the honest answer! I know it's a separate issue bit I wish you could address the PR reviewing issue as well. What are ways to actually get PRs looked at as a non-NVIDIA employee?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants