Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English mfa dictionary and corresponding G2P model #24

Open
vivian556123 opened this issue Sep 14, 2023 · 2 comments
Open

English mfa dictionary and corresponding G2P model #24

vivian556123 opened this issue Sep 14, 2023 · 2 comments

Comments

@vivian556123
Copy link

Hi, I want to use mfa pretrained English_mfa acoustic model and dictionary for alignment. I also want to use the same dictionary for G2P (from text to phoneme). What is the corresponding G2P model for me to transform a text into phoneme? I want to use it for tts inference.

Thanks a lot!

@mmcauliffe
Copy link
Member

For US English, the G2P model here: https://mfa-models.readthedocs.io/en/latest/g2p/English/English%20%28US%29%20MFA%20G2P%20model%20v2_0_0a.html was trained on the US pronunciation dictionary here: https://mfa-models.readthedocs.io/en/latest/dictionary/English/English%20%28US%29%20MFA%20dictionary%20v2_0_0a.html. However, do note that the dictionaries and G2P models are optimized for recognition of variation within and across dialects, so I don't know how applicable they would be for a TTS system that would I would imagine benefits for less variation.

@iamanigeeit
Copy link

iamanigeeit commented Feb 8, 2024

@vivian556123 If you want to generate using python instead of running in batches, you can

from montreal_forced_aligner.g2p.generator import PyniniGenerator
from montreal_forced_aligner.models import G2PModel, ModelManager
language = "english_us_mfa"

# If you haven't downloaded the model
# manager = ModelManager()
# manager.download_model("g2p", language)

model_path = G2PModel.get_pretrained_path(language)
g2p = PyniniGenerator(g2p_model_path=model_path, num_pronunciations=1)
g2p.setup()

Then call g2p.rewriter

>>> g2p.rewriter('my time')
['m aj tʰ aj m', 'm ɑ tʰ aj m', 'm ə tʰ aj m', 'mʲ i tʰ aj m']

However, i think there is no point using the MFA G2P, as the results are not sorted in order of likelihood. In fact, it seems that just mapping every word to the most common pronunciation is more accurate and faster. I would recommend using a different G2P library, as long as the phonemes are compatible (e.g. ARPA). For example https://pypi.org/project/g2p-en/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants