Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do allgather only in shared optimizer states groups #4167

Merged
merged 4 commits into from
Aug 21, 2023

Conversation

inkcherry
Copy link
Contributor

notice their only one way (BF16_Optimizer) for bf16+pipeline train.
pipeline parallel with bf16(use zero1 optimizer) will default call this even if dp size = 1.

some time we may only try mp/pp>1 with dp=1 , this may cost some time.

@tjruwase
Copy link
Contributor

@inkcherry, thanks for this PR. Note that this is a utility function is also used by zero stage 1 & 2, so this benefits more cases than bf16_optimizer.

@loadams loadams enabled auto-merge August 17, 2023 18:04
@loadams loadams added this pull request to the merge queue Aug 21, 2023
Merged via the queue into microsoft:master with commit 7f3e82f Aug 21, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants