Advice on Implementing Style Transfer with Emotion and Accents #3255
Unanswered
jd7jez
asked this question in
General Q&A
Replies: 1 comment
-
Interested in how you get on with this project. Please keep us posted. I have used it on a British accent but unfortunately the output makes it into American. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I stumbled upon Coqui the other day and was inspired to make it apart of my final project for my deep learning class at University. I saw in the project roadmap that Emotion and Style Adaptation are on the agenda but not yet completed, so I want to try and put some work in to this and see what contributions I can make. My current goal is to train one of the existing models to clone a users voice, and then allow for them to synthesize speech from text that is realized in a specified emotion/accent (such as anger or a british accent) that was not necessarily present in the cloned sample.
My current plan is to use the XTTS-v2 model (as I believe this was the most robust model I found) and train the decoder of the model on some new style while freezing the rest of the model. My idea was that this would bias the model towards the phoneme pronunciation in the desired style while maintaining the original speakers voice. I'm not sure if this is the best way to accomplish this task but this is the most simple and intuitive way I have come up with so far.
I am very new to the TTS field and my knowledge is extremely limited, so I would appreciate any and all advice as I am still reading through many of the research papers linked across this repo in hopes to understand more about implementation. I am also planning on only implementing this for English right now with different English speaking styles.
Please comment if you have input on this idea, thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions