You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Do you think that these sound quality are enougn to train tacotron model, I am having bad synthetised audio like these after 40 000 iterationsslash samples from training : slash slash slash slash
Your files are not accessible to me (and I presume others). There's a message saying they're blocked by the site owner.
However even with access, I think it would be something of a challenging question.
A significant factor will be the total quantity of audio you've got to train with (which you didn't mention) and then a rather intangible aspect is how consistent the audio is - you can have plenty of good quality audio and yet if it's too varied and inconsistent then it will be a struggle to train a model - of course confirming 'what's consistent enough' is going to be nigh on impossible. Are your transcriptions accurate or are there potentially errors?
Have you tried training with standard datasets? If so, how did you get on with them? How does the training you're doing with this data compare to that data? You often can't make assumptions across datasets but at least it would give you reassurance that you've got the basics working and you might be able to see how artificially degrading the quality of a known dataset impacts the ability to train it, until you get something approaching your set. Those are just some areas to think about.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
>>> Thierno_Ibrahima_DIOP
[March 1, 2021, 2:40pm]
Hi, Do you think that these sound quality are enougn to train tacotron
model, I am having bad synthetised audio like these after 40 000
iterations slash
samples from training : slash
slash
slash
slash
synthetised samples (during evaluation): slash
slash
slash
thanks for your help
[This is an archived TTS discussion thread from discourse.mozilla.org/t/datset-quality]
Beta Was this translation helpful? Give feedback.
All reactions