-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] No grad in YourTTS speaker encoder #2348
Comments
Nice catch, that's concerning. Thanks for reporting it. We'll look more into it but it looks like you are right. |
Nice catch. Indeed it is an issue. I will submit a PR to fix it. |
Thank you @Edresson Are you also planning to retrain the models and update the YourTTS paper at least on arxiv? 😇 |
I think it is not worth because if we do it we will need to recompute the MOS and Sim-MOS. I'm thinking about update the preprint, removing Speaker Consistency Loss from the methodology. And given that the Speaker Consistency Loss had no effect on the results, Speaker Consistency Loss experiments are equal than keep the model training per more 50k steps. In addition, I will try to retracting this issue on ICML published paper as well. Fortunately, It is a minor issue and the reported results are not effected (only the method description that is wrong). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
@Tomiinek Thanks so much for finding the bug and reporting it. I talked with all authors and the final decision was added a Erratum on YourTTS Github repository and on the last page of the preprint. It is done :). |
Describe the bug
Hello guys (CC: @Edresson @WeberJulian), when going through YourTTS code & paper, I noticed that you are calculating the inputs for the speaker encoder with no grads:
TTS/TTS/encoder/models/resnet.py
Lines 153 to 200 in d46fbc2
I suspect that the speaker encoder is not producing any gradients, and the speaker consistency loss has no effect.
It looks like this happens:
torch_spec
with no gradsloss.backward()
works as usually, but the speaker encoder does not contribute to the gradients flowing to the generator at allCould you please check on that?
To Reproduce
Expected behavior
No response
Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: