-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glow TTS Avg Loss Not Decreasing - Spanish LJSpeech Dataset #1750
Comments
Setting the
|
Added some logging to the exception to print the tensor with NaN value ( |
@thorstenMueller reported a similar problem regarding the learning rate recently on matrix with tacotron2 ddc. Maybe there was a recent change to the coqui-ai Trainer which impacts the learning rate? |
@lexkoro thank you! |
When I compare a healthy (green) and unhealthy (orange) model lr starting point rates, they are the same. I think the log prints an incomplete value, which is just a formatting issue. I am concerned with the avg loss not changing for the problematic model: BTW, @thorstenMueller I was able to get rid of the warning you mentioned on the chat channel:
by replacing the glow_tts.py preprocess() line:
with:
PR for the warning fix. |
Is the issue "not decreasing" or "getting NaN"? I'm confused. |
I am seeing both issues:
I encountered the first issue and started changing the parameters thinking that's the general area where the loss function calculation varies. Then I discovered the second issue. I believe there is some relatedness to #1683 when For this ljspeech Spanish dataset, I was debugging the NaN exception a bit more and I found that both
|
Thanks for your analysis and fix PR. 👍 |
Just to post the solution to my |
Thank you for posting this, @thorstenMueller ! @erogol would you have a moment to take a look at the loss staying constant issue? I am suspecting it could be a config mistake on my part related to the character set or phonemes. I don't think it's a dataset issue, since I think the released Tacotron model for Spanish was trained on the same data. |
I noticed that the training loss is increasing, causing the best model to remain the same, thus causing the eval loss remain constant: Thinking of @thorstenMueller comments, I looked into the same property Here is what I have after making that change, it looks like the lr is being adjusted, but a model is still not learning: @erogol this could be a dataset issue, is there any other way to confirm this? |
I think it is a dataset issue. Here is 500 steps into Argentinian Spanish female voice training, same model, same parameters (removed |
@iprovalo I've tried LJSpeech recipe with GlowTTS and could not replicate "constant loss" issue. It might be about the dataset. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
@iprovalo I've tried the LJSpeech GlowTTS recipe |
Describe the bug
When I train Glow TTS on LJSpeech Spanish set (angelina or victor) from AI Labs, the avg loss stays constant.
Victor (4k steps, large batch size, tried 32, 64, 128):
trainer_0_log.txt
config.txt
Angelina (229K steps, batch size 32):
trainer_0_log (1).txt
config.txt
To Reproduce
Train model for up to 230K steps, avg_loss won't change.
Expected behavior
No response
Logs
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: