Make sure deepspeed powered models are equivalent with their non deepspeed version #226

thomasw21 · 2022-01-06T11:43:27Z

@DanielHesslow has opened a PR #212. This allows us to evaluate Megatron-Deepspeed models using the EAI harness directly in this repo, without needing to convert models into HF format.

The current issue, is that we train models using deepspeed (*ModelPipe) but the evaluation script loads a model without deepspeed (*Model). This creates issues where we might have discrepancies between those two. Ex: #222

We need a test making sure that the output of both models are equal given an arbitrary configurations (regardless if we merge #212 or not)

cc @SaulLu @DanielHesslow @TevenLeScao

philippmtk · 2022-09-02T09:07:56Z

Hi, Thanks for providing these amazing Bloom model!
Just a quick question, whether someone has ensured the Deepspeed and Huggingface checkpoints lead to identical outputs?

mayank31398 · 2022-09-02T12:03:00Z

Hi @philippmtk unfortunately they don't lead to indentical outputs.
Unfortunately, resharding checkpoints changes the order of operations and this is a problem with floating point arithmetic.
fp operations are not associative.
Please refer to this issue:
pytorch/pytorch#76232

thomasw21 added the Good First Issue Good for newcomers label Jan 6, 2022

thomasw21 mentioned this issue Jan 6, 2022

Fix alibi #222

Merged

thomasw21 changed the title ~~Make sure Deepspeed power models and equivalent with their non deepspeed version~~ Make sure deepspeed powered models are equivalent with their non deepspeed version Jan 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure deepspeed powered models are equivalent with their non deepspeed version #226

Make sure deepspeed powered models are equivalent with their non deepspeed version #226

thomasw21 commented Jan 6, 2022 •

edited

Loading

philippmtk commented Sep 2, 2022

mayank31398 commented Sep 2, 2022

Make sure deepspeed powered models are equivalent with their non deepspeed version #226

Make sure deepspeed powered models are equivalent with their non deepspeed version #226

Comments

thomasw21 commented Jan 6, 2022 • edited Loading

philippmtk commented Sep 2, 2022

mayank31398 commented Sep 2, 2022

thomasw21 commented Jan 6, 2022 •

edited

Loading