You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I tried to launch a script with a simple classification problem but got the error "socket close". I tried with deberta small and base so I doubt it is a memory error. Moreover I tried with Kaggle (TPUv3) and Colab (TPUv2). The same script with a roberta base model works perfectly fine. The length I used was 128.
Hi @Shiro-LK, we're seeing other reports of issues with DeBERTa running slowly on TPU with TF - see #18239. I'm not sure what the cause of the "socket closed" error is though - the other user got it to run, but just had a lot of slowdown on one of the layers.
@Rocketknight1 Thank for the reply. Yes I have just looked at it. But it does not seem to use the keras function "model.fit" so I wonder if that"s the issue.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.20.1TPU : v2 and v3
Who can help?
@Rocketknight1
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I tried to launch a script with a simple classification problem but got the error "socket close". I tried with deberta small and base so I doubt it is a memory error. Moreover I tried with Kaggle (TPUv3) and Colab (TPUv2). The same script with a roberta base model works perfectly fine. The length I used was 128.
I created the model using this :
It also seems that Embedding is not compatible with bfloat16 :
https://colab.research.google.com/drive/1T4GGCfYy7lAFrgapOtY0KBXPcnEPeTQz?usp=sharing
Expected behavior
A regular training like training roberta. On GPU, the same script is working and use 3 or 4 GB.
The text was updated successfully, but these errors were encountered: