-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mDeBERTa on HuggingFace hub does not seem to work #77
Comments
For mDeBERTa, you need to use fp32. There is a fix in our official repo and we are going to port the fix to transformers soon. |
Cool, this means that after the fix I can use fp16 as well? |
Is there an update on this? I don't think an updated was pushed to the huggingface hub: https://huggingface.co/microsoft/mdeberta-v3-base/commits/main Would be great to be able to use it with FP16 |
Have you figured it out, guys? |
ValueError: Tokenizer class DebertaV2Tokenizer does not exist or is not currently imported. |
@BigBird01 do you have any update on this by chance? |
@jtomek @abdullahmuaad9 @BigBird01 @MoritzLaurer @barschiiii |
Hello team! |
Yes
Can you tell me which kind of update
thank you
…On Tue, Apr 4, 2023 at 7:47 PM rfbr ***@***.***> wrote:
Hello team!
Is there any update on this? @jtomek <https://github.com/jtomek>
@abdullahmuaad9 <https://github.com/abdullahmuaad9> @BigBird01
<https://github.com/BigBird01> @MoritzLaurer
<https://github.com/MoritzLaurer> @barschiiii
<https://github.com/barschiiii>
Thanks!
—
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIXFRF3EEZYVOTZDLIOGKJDW7QUN3ANCNFSM5JLMVEUA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Abdullah Y Muaad*
*Research Scholar*
*Department of Studies, Computer Science*
*University of Mysore *
*Mysore- Karnataka- India.*
*Researchgate* <https://www.researchgate.net/profile/Abdullah-Muaad-2>|
Twitter <https://twitter.com/abdullahmuaad1> | LinkedIn
<https://www.linkedin.com/company/hindawi> | Google Scholar
<https://scholar.google.com/citations?user=ZbX-qJ0AAAAJ&hl=en>
*Mob.No: 00919148249825*
|
I pinged you just in case you were interested by the future answer from the Microsoft team on the possibility to use fp16 with mDeBERTa. |
Hello there! |
Hey @rfbr I tried updating the DisentangledSelfAttention module in HF transformers with the one in this repo, but when fine-tuning on extractive QA (on squad 2.0) with fp16 I was still getting Nan predictions. Do you have an example implementation in the transformers code I could look at? Update: Actually it seems like I got it to work. It appears the key was calculating the DeBERTa/DeBERTa/deberta/disentangled_attention.py Lines 85 to 86 in 4d7fe0b
instead of whats implemented in transformers https://github.com/huggingface/transformers/blob/ef42c2c487260c2a0111fa9d17f2507d84ddedea/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L724-L725 scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
attention_scores = torch.bmm(query_layer, key_layer.transpose(-1, -2)) / scale.to(dtype=query_layer.dtype) which uses all torch functionality. Could this be because we aren't calling something like |
Hey! Is there any update on this @BigBird01? I'm using the last version of transformers 4.29.2 and I'm still facing the same issue when using fp16. When will you port the fix? Thanks. |
Hey @jplu I think I was able to port the changes into my forked branch of transformers here. If you'd just like to see the git diff so you can try the same take a look here. I did this by comparing the implementation in this repo compared to the one in transformers. Doing this I was able to get fp16 training working in transformers. |
Hey @sjrl! Thanks a lot for sharing this. Indeed I confirm with your code the ability to train with fp16. Did you apply for a PR on the main repo? If not would be nice to have this fix integrated. |
@jplu Just opened the PR! I took some time to find the minimal changes needed to get the fp16 training to work. Hopefully that will speed up the review process. |
Awesome this seems perfect! Thanks a lot! |
This is honestly perfect, @sjrl. What a clever way to solve the problem! 🤩 |
I really like the DeBERTa-v3 models and the monolingual models work very well for me. Weirdly enough, the multilingual model uploaded on the huggingface hub does not seem to work. I have a code for training multilingual models on XNLI, and the training normally works well (e.g. no issue with microsoft/Multilingual-MiniLM-L12-H384), but when I apply the exact same code to mDeBERTa, the model does not seem to learn anything. I don't get an error message, but the training results look like this:

I've double checked by running the exact same code on multilingual-minilm and the training works, which makes me think that it's not an issue in the code (wrongly formatting the input data or something like that), but something went wrong when uploading mDeBERTa to the huggingface hub? Accuracy of exactly random 0.3333, 0 training loss at epoch 2 and NaN validation loss maybe indicates that the data is running through the model, but some parameters are not updating or something like that?
My environment is google colab; Transformers==4.12.5
The text was updated successfully, but these errors were encountered: