-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WavLM returns empty hidden states when loaded directly to GPU #31970
Comments
cc @kamilakesbi |
cc @ylacombe |
I've been able to trace back the issue to the warning about weight_g/weight_v that is missing when using WeightNorm. When But when Some weights of the model checkpoint at microsoft/wavlm-large were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-large and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1'] This is thus related to #26796 and to @kamilakesbi's #32194! I still have to figure out if the latter corrects our issue cc @eustlb for visibility |
System Info
transformers
version: 4.42.4Who can help?
@sanchit-gandhi @Gant
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Outputs of the hidden states are NaN when directly loading the model to the GPU. They work when the model is run on the CPU or first loaded to the CPU then moved to the GPU.
This issue can be reproduced using the following code taken from WavLM's huggingface documentation.
The above outputs a tensor with only NaNs. This does not occur if we load the model to the cpu first and then move it to the gpu. (
model.to("cuda:4")
)Expected behavior
The hidden states are not NaN when the model is loaded directly to the gpu.
The text was updated successfully, but these errors were encountered: