Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure deepspeed powered models are equivalent with their non deepspeed version #226

Open
thomasw21 opened this issue Jan 6, 2022 · 2 comments
Labels
Good First Issue Good for newcomers

Comments

@thomasw21
Copy link
Member

thomasw21 commented Jan 6, 2022

@DanielHesslow has opened a PR #212. This allows us to evaluate Megatron-Deepspeed models using the EAI harness directly in this repo, without needing to convert models into HF format.

The current issue, is that we train models using deepspeed (*ModelPipe) but the evaluation script loads a model without deepspeed (*Model). This creates issues where we might have discrepancies between those two. Ex: #222

We need a test making sure that the output of both models are equal given an arbitrary configurations (regardless if we merge #212 or not)

cc @SaulLu @DanielHesslow @TevenLeScao

@thomasw21 thomasw21 added the Good First Issue Good for newcomers label Jan 6, 2022
@thomasw21 thomasw21 mentioned this issue Jan 6, 2022
@thomasw21 thomasw21 changed the title Make sure Deepspeed power models and equivalent with their non deepspeed version Make sure deepspeed powered models are equivalent with their non deepspeed version Jan 7, 2022
@philippmtk
Copy link

Hi, Thanks for providing these amazing Bloom model!
Just a quick question, whether someone has ensured the Deepspeed and Huggingface checkpoints lead to identical outputs?

@mayank31398
Copy link
Collaborator

Hi @philippmtk unfortunately they don't lead to indentical outputs.
Unfortunately, resharding checkpoints changes the order of operations and this is a problem with floating point arithmetic.
fp operations are not associative.
Please refer to this issue:
pytorch/pytorch#76232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants