Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering out very low quality and novelty voices #1

Open
HadrienGardeur opened this issue Aug 2, 2024 · 0 comments
Open

Filtering out very low quality and novelty voices #1

HadrienGardeur opened this issue Aug 2, 2024 · 0 comments
Assignees

Comments

@HadrienGardeur
Copy link
Member

HadrienGardeur commented Aug 2, 2024

As part of our work on voice selection, one of the very first things that we need to do is to filter out very low quality and novelty voices.

Very low quality voices are broken down into two groups:

  • Eloquence voices on Apple devices
  • and eSpeak voices on Chrome OS

Eloquence voice suffer from the usual Apple approach which means that they can show up with two different name/voiceURI when calling getVoices():

  • "Name"
  • or "Name (Language (Country))" where language and country are localized based on country settings

It's also worth pointing out that for each of these Apple voice, we'll usually find 14 different match across as many different languages. These languages are all identified using language and otherLanguages in the JSON file documenting these voices.

eSpeak voices are much more straightforward and matching them based on their name is enough.

For novelty voices, the situation is a little different. They're also localized by Apple, but not in a way that can be easily predicted. For example "Bad News" gets translated into "Mauvaises nouvelles" in French and "Malas noticias" in Spanish.

In the JSON, these translations are listed under altNames while the English names are in name.

In order to filter out these very low quality voices and novelty voices, we can follow these steps:

  • obtain all voices from the Web Speech API using getVoices()
  • iterate through voices from these two lists
  • try to find one or more match for each voice based on name using a regular expression that accepts any character after the string extracted from name, if any results are found, remove them from the list of voices and skip to the next voice in the list
  • try to find an exact match for each voice based on values in altNames, if any result is found, remove it from the list of voices and skip to the next voice in the list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In review
Development

No branches or pull requests

2 participants