-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions related to MeloTTS #1193
Comments
Could you tell us how to get the input for bert from texts? Are there any C++ implementation for that? |
In this code, you can get the bert value through the get_bert function. |
In your code, there is a part where bert and ja_bert are entered as model inputs in ModelWrapper. sherpa-onnx/scripts/melo-tts/export-onnx.py Line 172 in 963aaba
So, even though I specified input_names as below when exporting to the onnx model, I am experiencing the phenomenon that there is no bert in the input in the onnx file.
|
Please have a look at this comment. That is the main obstacle. If you can fix it, then we can support bert. |
Yes, I know that. I am asking do you know if there is a C++ implementation for that or is it possible to implement it in C++? |
As far as I know, there is currently no Korean version of Bert C++. I will try it and let you know.
|
By the way, the main issue is about the tokenizer. |
Yes, I know that. sherpa-onnx/scripts/melo-tts/export-onnx.py Line 162 in 963aaba
|
In that case, supporting Korean models from MeloTTS in sherpa-onnx may be hard. Could you try We have already had a Korean TTS model in sherpa-onnx. |
I found this repo while trying to export MeloTTS models ONNX. I already have a Korean tts model trained with custom data. The Korean version of MeloTTS torch model is exported to ONNX for inference, so it is quite fast. As you mentioned earlier, the biggest question is "How do we implement the bert torch model in C++?" is correct. Thank you for the reply. |
I just added the support for passing a callback from Swift to C. Please see #1218 Please play the samples received in the callback by yourself, possibly in a separate thread. We don't have time to add that.
Please have a look at #1172 By the way, contributions to sherpa-onnx are highly appreciated. Hope that you can fix the issues by yourself. |
@csukuangfj No problem. I actually made some contributions but noticed the latest version fixes most of the issues i found. Example in By the way, I just checked out MeloTTS, finetuned a model and exported to sherpa onnx for android. It's great. How can i help bring this to ios? I'm not sure the swiftui tts example accepts melo tts models |
Yes, it is already supported. In case you don't know how to do it, I just added an example for you. |
@csukuangfj I have a single speaker fine tuned model (melo). it works great but when i convert to sherpa onnx and then use the provided zh_en *.fst and .dict on android , i get wrong synthesis. I assumed it would work since my model is english. how can i generate the *.fst and .dict files for my custom model? or can we make it work by changing the configurations? |
You don't need *.fst for English only models. Could you post the code about how you add the metadata?
Could you be more specific? What does |
@csukuangfj thanks for the prompt response. "wrong" here means unexpected output. wrong pronunciations. Sorry but this is how i export (the default export script only exports chinese_english):
then in api.py i do:
|
Could you post some please also post the logs if you use |
https://github.com/csukuangfj/onnxruntime-build/actions/runs/9184634501 You can see from the above link that we can successfully build a debug version of static lib. |
custom model 1 : Eng, news (african accent) text - "things to look out for in the year 2020" .pth generated wav - output.movonnx generated wav - generated.movcustom model 2 - Eng, singing (us accent) text - "next time won't you sing with me" .pth generated wav - output.movonnx generated wav - generated.movi use sherpa-onnx but don't get logs. I was only trying out Melo on sherpa so models were not trained for long (training is not the issue though) I hope you're able to spot the issue. Thanks |
@csukuangfj I can also share my model.pth and config.json files if that'd help. |
When you use |
The results is still better than onnx's when i zero out the bert part |
Could you show the code about how you did that? |
In api.py
please share your solution if that is wrong. |
could you please post the complete code? |
|
Could you change
to bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
bert.zero_()
ja_bert.zero_() |
results is generated wav that sounds almost same as the original .pth inference (without zeroing out ) except for some few pronunciations that sound off. however it's way better than the wavs from onnx above. Here is the output with bert zeroed out: eng.movoutput_sing.movI then tried :
|
Please compare the inputs to the model manually and see if they are the same. |
do anyone have google collab notebook for this? convert models? i need japan tts voices |
Please see It is for Chinese+English MeloTTS model. |
Is there one for English only? In future if there is a way to convert a standard English model from the official training script can you share here? Thanks |
Sorry, I only have this one. |
请用 onnxruntime 1.12.0 微信群里,有同学反馈, 使用 onnxruntime 1.12.0, gpu 跑 melo tts, 不会有问题. |
@csukuangfj any updates on getting the default MeloTTS models to work? |
Could you describe the issue you have? |
There is support for Chinese+English MeloTTS model only . If one wants to use metlotts they have to stick to the Chinese+English model . |
please adapt our current script. if you have any troubles, please post error logs. |
I already tried that above and it didn't work . |
Please see and please find why it didn't work for you. @nanaghartey |
Thanks for this. I tried it out with the model you shared - https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-melo-tts-en.tar.bz2 I have this:
I use the same dict for Chinese + English model since i don't have any other. I get this when i run the app :
the app hangs during start up with the logs below:
|
please don't use files not included in the model directory you have downloaded. that is, do not use dict dir. |
All you need has included in the model tar.bz2 file. Please see the comment in #1509 for usage |
@csukuangfj
the app loads all right but when i enter text and tap generate i get this.
|
are you using the latest master to build the libraries? How did you get the '.so' files? |
for quick testing, I used the .so files in sherpa-onnx-1.10.30-arm64-v8a-zh_en-tts-vits-melo-tts-zh_en.apk from https://k2-fsa.github.io/sherpa/onnx/tts/apk.html I just built the .so files and tested. Works now! Thanks a lot. By the way in the export-onnx-en script i only changed :
To
That should be enough right? It works but wondering if i need to change something else to improve pronunciation |
You can try to enable bert support. |
I hope you understand that support for melo-tts English model is added after 1.10.30 and you need to use the latest master to test it, not the code or library from 1.10.30. |
sure i'll try that. Thanks |
I noticed some pronunciation differences: For example, "Google" is pronounced correctly using the original Melo TTS model. However, on the Sherpa ONNX-converted Melo TTS model, each letter is pronounced individually as G-O-O-G-L-E. |
First, we don't use G2P. Second, all words that can be pronounced are enumerated in Third, in case you have a word that is not in |
@csukuangfj I'm well aware of this . The challenge is I don't know which words users will be generating so i can't add them manually |
So is it possible to to add words dynamically during runtime? or a different approach like piper ? without manually adding words to lexicon ?? |
If you know the pronunciation of the words, then it is possible. |
Thank you for creating a great repository.
I wonder why there is no bert when converting a pytorch model of MeloTTS to an Onnx model.
https://github.com/k2-fsa/sherpa-onnx/blob/963aaba82b01a425ae8dcf0fdcff6b073a45686f/scripts/melo-tts/export-onnx.py#L206C1-L235C6
The text was updated successfully, but these errors were encountered: