-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The trained models seem overfitted to their training sets? #7
Comments
Hi, We think it might not be caused by the over-fitting issue because they actually work well for the unseen speakers coming from the same datasets, and the LibriTTS corpus might be large enough for common speech synthesis tasks. Therefore, the possible workaround solution for this model may be to train the model using as much/diverse as possible data or directly train the model using the data recorded in the same situation as the data you will use for encoding/decoding. |
Do you think this problem comes from the encoder or the decoder part? |
I think it comes from both. [Decoder] [Encoder] Therefore, I think fine-tuning the whole model using the new data might achieve the best performance. However, we show that fine-tuning only the encoder still can do denoising well, so fine-tuning the encoder might be more important than the decoder. |
I compressed a few examples with the 24kHz
libritts_v1
model and they sounded great at very low bitrates but when I tried downsampling something from VCTK to 24kHz and pushing it through the same model the quality suffered a lot. I've seen the same problem when testing it out on some clean speech extracted from a YouTube video.Since LibriTTS and VCTK are pretty small datasets is it possible that the pretrained models are overfitted to them a bit too much?
The text was updated successfully, but these errors were encountered: