-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Languages #5
Comments
I'm not planning to support languages other than English any time soon. The problem isn't so much the animation, but the voice recognition before it. Let me know how your game goes -- I'm looking forward to seeing Rhubarb in action! |
I have been looking at papagayo, and someone made a add on with support for 10 more languages (released under gpl). http://www.lostmarble.com/forum/viewtopic.php?t=5056 Thanks you for a great tool anyhow! |
Papagayo and Rhubarb work a bit differently. If I remember correctly, Papagayo requires perfect dialog. It converts the dialog into phones (that's what the plugin you linked does), then leaves it to the user to align the phones with the recording. After that, I believe Papagayo does a simple mapping to convert the phones into mouth shapes. For Rhubarb, on the other hand, converting the dialog text into phones is merely the first step. (This is where your plugin might help.) After that, Rhubarb performs actual with voice recognition (guided by the dialog text, which is optional) and automatic alignment. These are the problematic steps, as they require acoustic and language models describing the target language. These models take months or even years to create. For more information, see this link on training acoustic models. Finally, Rhubarb applies a pipeline of transformation steps to convert the timed phones into animation. These steps are rather language-independent. |
Thank you for your very well explained answer! |
We can add languages like Swedish if you are interested in it. |
Yes I'm very interested in that! |
Hi Nickolay, nice to see you here! :-) I'll think about what it would take to make Rhubarb multi-lingual. That may take some time, though; I'm currently rather busy making last-minute improvements for Thimbleweed Park. |
I spent some time reflecting on what it would take to support other languages in the same quality as English. Here's what I came up with:
Bottom line: I might tackle these things at some point in the future. But it will be a lot of work, so I'm not making any promises. Right now, this isn't my focus. |
Very interesting read, thanks for the summary |
@DanielSWolf Thanks for explaining. I am planning to work with Phones too, and I cam clearly see how they may affect including other languages in future. Is there any means through which I can contact you? |
@saurabhshri: I just PM'ed you. |
I did some more thinking on this topic and I've come up with a solution for multi-language support that should work well. However, I won't have time to implement this feature any time soon. Here's a rough sketch: Each supported language is modeled as a plugin. A plugin is a directory (or archive?) that can be placed into the Rhubarb directory, where Rhubarb will find it. A new command-line option allows you to specify the language. The default is A plugin contains the following:
A downloaded plugin should work as-is on any platform. If the code it contains were written in C/C++, we would need a platform-specific compilation step. So we should use an embedded scripting language instead. I'm thinking of Lua: It's easy to compile and integrate, very small, and well-documented. Each plugin is maintained in its own Git repo. A trivial build task converts it to an archive file for release. There are still a couple of rough edges:
This approach should cover all the problems I mentioned above:
|
Sounds nice, actually I think espeak is a nice project to look on, it is not a state of the art in synthesis, but language support for preprocessing and g2p is very good. |
I'll have a look at it. Thank you! |
Hi @DanielSWolf, |
@PiOverFour Thanks for offering your help! I am still planning to support additional languages in the future. In fact, I feel that this is one of the key missing features right now. However, I'm currently re-thinking the technical implementation. Instead of relying solely on PocketSphinx, I'm thinking about adding support for other voice recognition services, such as the cloud services offered by Google, Microsoft, and IBM. They are not free, but they offer higher recognition rates than PocketSphinx for a large number of languages. It will still take a long time until I've finished the technical basis. When that time has come, I'll certainly get back to you if I need input from a native French speaker! |
Oh, that's cool to hear! Looking forward to it. |
@DanielSWolf |
Thanks, but that doesn't really solve my problem. My problem is not with Unicode per se, but with Unicode identifiers. Right now, I have an enum that looks like this: enum class Phone { AO, AA, IY, ... } It covers the basic US-English ARPAbet phonemes. In order to support multiple languages, I will have to represent the full IPA set in a similar fashion. Ideally, I'd like to do this: enum class Phone { ɸ, ɳ, ʔ, ... } The C++11 standard will let me do this, since these Unicode characters are valid within identifiers. But GCC won't (see above). Using X-SAMPA isn't an option; this just isn't valid C++: enum class Phone { p\, n`, ?, ... } But as I wrote above, I can easily circumvent the problem altogether by representing phonemes as strings, not enum values. This way, I can use the real IPA characters with any compiler. |
Latest versions of pocketsphinx is capable to output phonemes instead of words - https://cmusphinx.github.io/wiki/phonemerecognition/ |
Funny that you mention that! Phonetic recognition is not a new feature in PocketSphinx. In fact, the very first version of Rhubarb used it. The problem was that the error rate with this model is rather high. I discovered that the error rate dropped significantly when recognizing words instead of phones. This, of course, only applies to English dialog. So right now, I'm in the process of adding optional phonetic recognition back into Rhubarb. This should give better results for languages other than English. For details, see this thread, starting at the linked comment. This is only a temporary solution. In the long run, I still plan to implement full (word-based) recognition for languages other than English. |
@DanielSWolf Woah, that's great! Can I have access to source of the customized version of Rhubarb mentioned here? - https://forums.thimbleweedpark.com/t/thimbleweed-park-italian-fan-dub-project-official-thread-tm/2102/361 |
I'll push a branch as soon as I get a chance. Might be a few days, though. |
Great! Looking forward to it. ^__^ |
I've created a new issue (#45) for phonetic recognition so that this issue can focus on true multi-language support. |
We need lip sync in Russian and Chinese, we make our own TTS, so we don't need to recognize audio. Is it possible to do this with the current version of the software? |
Out of the box, Rhubarb only comes with two recognition modes. If you are prepared to make source code changes, you could implement your own |
Hi Daniel! |
Hi @kapamees! The short answer is: No, out of the box, Rhubarb cannot animate without a sound file. Modifying it to work on dialog alone would require moderate programming skills in C++. If you're interested in making these modifications yourself and need some guidance, feel free to create a new issue. |
@DanielSWolf Thank you, no worries! Im no good at it ;) |
Wow this looks great!
Just wondering about language support. You have any plans to support other languages than English?
I'm planing to do LipSync for my game in Unity that has animations done in Spine (Esoteric Software).
The text was updated successfully, but these errors were encountered: