-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update zero_to_fp32.py - to support deepspeed_stage_1 #3936
Conversation
Since deepspeed 1 checkpoint structure is identical to deepspeed 2 (AFAIK), we should just change the version check and add support accordingly
@stas00, do you remember why stage 1 was excluded? |
I think at the time I developed it I didn't think anybody used it, or at least I didn't, so I didn't have a use case for it. |
@PicoCreator, thanks for this PR. |
Glad to see my wild guess, work and be of use (found a few issues on dependent projects that encounted this) I currently use deepspeed 1 to train small toy models (<=3B), as fast as possible, and to test param/model architecture changes =) on consumer hardware (where the gpu-to-gpu communication of deepspeed 2+ is noticeable) |
@PicoCreator, thanks for sharing your context and experience. We always appreciate hearing customer stories. I want to share a minor naming clarification in terms of DeepSpeed versus ZeRO.
|
Since deepspeed 1 checkpoint structure is identical to deepspeed 2 (AFAIK), we should just change the stage check and add support accordingly
However I am not 100% sure if this is intentional by design, or some coincidence in my use case - so might need someone with more knowledge on this topic to weigh in 🤔