mDeBERTa on HuggingFace hub does not seem to work #77

MoritzLaurer · 2021-12-04T09:54:26Z

I really like the DeBERTa-v3 models and the monolingual models work very well for me. Weirdly enough, the multilingual model uploaded on the huggingface hub does not seem to work. I have a code for training multilingual models on XNLI, and the training normally works well (e.g. no issue with microsoft/Multilingual-MiniLM-L12-H384), but when I apply the exact same code to mDeBERTa, the model does not seem to learn anything. I don't get an error message, but the training results look like this:

I've double checked by running the exact same code on multilingual-minilm and the training works, which makes me think that it's not an issue in the code (wrongly formatting the input data or something like that), but something went wrong when uploading mDeBERTa to the huggingface hub? Accuracy of exactly random 0.3333, 0 training loss at epoch 2 and NaN validation loss maybe indicates that the data is running through the model, but some parameters are not updating or something like that?

My environment is google colab; Transformers==4.12.5

BigBird01 · 2021-12-04T20:05:36Z

For mDeBERTa, you need to use fp32. There is a fix in our official repo and we are going to port the fix to transformers soon.

MoritzLaurer · 2021-12-05T11:03:45Z

Cool, this means that after the fix I can use fp16 as well?

MoritzLaurer · 2022-01-05T15:55:07Z

Is there an update on this? I don't think an updated was pushed to the huggingface hub: https://huggingface.co/microsoft/mdeberta-v3-base/commits/main

Would be great to be able to use it with FP16

jtomek · 2022-03-30T12:32:46Z

Have you figured it out, guys?

abdullahmuaad9 · 2022-04-25T11:08:29Z

ValueError: Tokenizer class DebertaV2Tokenizer does not exist or is not currently imported.
any idea please share
thanks in adavnce

barschiiii · 2022-05-17T12:18:39Z

@BigBird01 do you have any update on this by chance?

jaideep11061982 · 2023-01-04T13:31:23Z

@jtomek @abdullahmuaad9 @BigBird01 @MoritzLaurer @barschiiii
any fix for this . I get the NaN with m deberta

rfbr · 2023-04-04T14:16:48Z

Hello team!
Is there any update on this? @jtomek @abdullahmuaad9 @BigBird01 @MoritzLaurer @barschiiii @jaideep11061982
Thanks!

abdullahmuaad9 · 2023-04-05T04:36:57Z

Yes Can you tell me which kind of update thank you

…

On Tue, Apr 4, 2023 at 7:47 PM rfbr ***@***.***> wrote: Hello team! Is there any update on this? @jtomek <https://github.com/jtomek> @abdullahmuaad9 <https://github.com/abdullahmuaad9> @BigBird01 <https://github.com/BigBird01> @MoritzLaurer <https://github.com/MoritzLaurer> @barschiiii <https://github.com/barschiiii> Thanks! — Reply to this email directly, view it on GitHub <#77 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIXFRF3EEZYVOTZDLIOGKJDW7QUN3ANCNFSM5JLMVEUA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- *Abdullah Y Muaad* *Research Scholar* *Department of Studies, Computer Science* *University of Mysore * *Mysore- Karnataka- India.* *Researchgate* <https://www.researchgate.net/profile/Abdullah-Muaad-2>| Twitter <https://twitter.com/abdullahmuaad1> | LinkedIn <https://www.linkedin.com/company/hindawi> | Google Scholar <https://scholar.google.com/citations?user=ZbX-qJ0AAAAJ&hl=en> *Mob.No: 00919148249825*

rfbr · 2023-04-05T11:47:16Z

I pinged you just in case you were interested by the future answer from the Microsoft team on the possibility to use fp16 with mDeBERTa.

rfbr · 2023-04-09T18:33:26Z

Hello there!
I have tracked the different modules to find where the under/overflows are happening. The DisentangledSelfAttention module is the culprit, replacing it with the implementation in this repo fixed the issue (I haven't spend the time to find the specific operation causing the NaN).

sjrl · 2023-05-07T13:06:49Z

Hey @rfbr I tried updating the DisentangledSelfAttention module in HF transformers with the one in this repo, but when fine-tuning on extractive QA (on squad 2.0) with fp16 I was still getting Nan predictions. Do you have an example implementation in the transformers code I could look at?

Update: Actually it seems like I got it to work. It appears the key was calculating the scale like this (using the math library)

DeBERTa/DeBERTa/deberta/disentangled_attention.py

Lines 85 to 86 in 4d7fe0b

    
           scale = 1/math.sqrt(query_layer.size(-1)*scale_factor) 
        
           attention_scores = torch.bmm(query_layer, key_layer.transpose(-1, -2)*scale)

instead of whats implemented in transformers
https://github.com/huggingface/transformers/blob/ef42c2c487260c2a0111fa9d17f2507d84ddedea/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L724-L725

scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
attention_scores = torch.bmm(query_layer, key_layer.transpose(-1, -2)) / scale.to(dtype=query_layer.dtype)

which uses all torch functionality.

Could this be because we aren't calling something like detach in the transformers code? Or maybe it has to do with the order of operations (e.g. perform the division before the multiplication as is done in this repo).

jplu · 2023-05-30T16:57:41Z

Hey! Is there any update on this @BigBird01? I'm using the last version of transformers 4.29.2 and I'm still facing the same issue when using fp16. When will you port the fix?

Thanks.

sjrl · 2023-05-30T17:06:46Z

Hey @jplu I think I was able to port the changes into my forked branch of transformers here. If you'd just like to see the git diff so you can try the same take a look here. I did this by comparing the implementation in this repo compared to the one in transformers.

Doing this I was able to get fp16 training working in transformers.

jplu · 2023-05-31T07:38:48Z

Hey @sjrl! Thanks a lot for sharing this. Indeed I confirm with your code the ability to train with fp16. Did you apply for a PR on the main repo? If not would be nice to have this fix integrated.

sjrl · 2023-06-08T15:42:59Z

@jplu Just opened the PR! I took some time to find the minimal changes needed to get the fp16 training to work. Hopefully that will speed up the review process.

jplu · 2023-06-08T15:46:06Z

Awesome this seems perfect! Thanks a lot!

jtomek · 2023-06-09T09:10:48Z

This is honestly perfect, @sjrl. What a clever way to solve the problem! 🤩

BigBird01 closed this as completed Dec 4, 2021

sjrl mentioned this issue Jun 8, 2023

fix overflow when training mDeberta in fp16 huggingface/transformers#24116

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mDeBERTa on HuggingFace hub does not seem to work #77

mDeBERTa on HuggingFace hub does not seem to work #77

MoritzLaurer commented Dec 4, 2021 •

edited

Loading

BigBird01 commented Dec 4, 2021

MoritzLaurer commented Dec 5, 2021

MoritzLaurer commented Jan 5, 2022

jtomek commented Mar 30, 2022

abdullahmuaad9 commented Apr 25, 2022

barschiiii commented May 17, 2022

jaideep11061982 commented Jan 4, 2023

rfbr commented Apr 4, 2023 •

edited

Loading

abdullahmuaad9 commented Apr 5, 2023 via email

rfbr commented Apr 5, 2023

rfbr commented Apr 9, 2023

sjrl commented May 7, 2023 •

edited

Loading

jplu commented May 30, 2023

sjrl commented May 30, 2023

jplu commented May 31, 2023

sjrl commented Jun 8, 2023

jplu commented Jun 8, 2023

jtomek commented Jun 9, 2023

mDeBERTa on HuggingFace hub does not seem to work #77

mDeBERTa on HuggingFace hub does not seem to work #77

Comments

MoritzLaurer commented Dec 4, 2021 • edited Loading

BigBird01 commented Dec 4, 2021

MoritzLaurer commented Dec 5, 2021

MoritzLaurer commented Jan 5, 2022

jtomek commented Mar 30, 2022

abdullahmuaad9 commented Apr 25, 2022

barschiiii commented May 17, 2022

jaideep11061982 commented Jan 4, 2023

rfbr commented Apr 4, 2023 • edited Loading

abdullahmuaad9 commented Apr 5, 2023 via email

rfbr commented Apr 5, 2023

rfbr commented Apr 9, 2023

sjrl commented May 7, 2023 • edited Loading

jplu commented May 30, 2023

sjrl commented May 30, 2023

jplu commented May 31, 2023

sjrl commented Jun 8, 2023

jplu commented Jun 8, 2023

jtomek commented Jun 9, 2023

MoritzLaurer commented Dec 4, 2021 •

edited

Loading

rfbr commented Apr 4, 2023 •

edited

Loading

sjrl commented May 7, 2023 •

edited

Loading