Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized output for voices() #2

Open
HadrienGardeur opened this issue Aug 2, 2024 · 0 comments
Open

Standardized output for voices() #2

HadrienGardeur opened this issue Aug 2, 2024 · 0 comments
Assignees

Comments

@HadrienGardeur
Copy link
Member

HadrienGardeur commented Aug 2, 2024

Once very low quality and novelty voices have been filtered out, the next step is to work on a standardized output for the voices() method:

  • label using the label documented in recommended voices with a fallback on the name returned by getVoices()
  • voiceURI using the same property returned by getVoices()
  • language using the lang value returned by getVoices()
  • gender as documented in recommended voices, this property is omitted if this information is missing
  • age as documented in recommended voices, this property is omitted if this information is missing
  • offlineAvailability using a boolean based on localService, as returned by getVoices()
  • quality based on the best available quality that can be detected, this property is omitted if this information is missing
  • pitchControl using a boolean which defaults to true if it's undocumented
  • recommendedPitch using the pitch value documented in recommended voices, this property is omitted if this information is missing
  • recommendedRate using the rate value documented in recommended voices, this property is omitted if this information is missing

In order to produce this output, we'll need to work with:

For most of the output, this is fairly straightforward with the value either coming from our filtered list of available voices on the system or the list of recommended voices. But there are a few tricks worth pointing out:

  • the value for language
  • and matching Apple and Android voices

As documented, there are inconsistencies in the values returned for language:

These inconsistencies need to be handled in our code to make sure that we always return a valid BCP-47 language tag instead.

Matching Apple and Android voices is also more difficult than it should be:

The usual approach for matching voices should be based simply on name.

For Android, if we detect the presence of localizedName with the value set to android then a special rule should be applied:

  • in addition to name, we should also match against the following pattern {language} {region} where {language} and {region} are localized based on the system settings
  • these locales are available as separate JSON files, using language we can identify the language and region and fetch the right values for all available locales

For Apple devices, this is even more complicated. Apple voices can have various quality variants, but we don't want to list two or three of them in our voices() output, which means that we need to keep the best one and drop the other variants from the list.

In order to do that, we can rely on:

  • quality to list all the variants potentially available
  • and the locales to know how these various quality variants are translated

Overall, we'll encounter three different cases for Apple voices:

  • Name
  • Name ({quality}) where quality is localized
  • Name ({language} ({region})) where language and region are localized

In order to detect the best voices we can follow an algorithm where the first match is added to our output and all subsequent matches are removed from the list:

  • Look up for the Name ({quality}) variant by checking for the highest quality variant first and then going down. Since we don't know the system language (this can't be predicted reliably based on languages returned by the browser), we must look for all known translation of a quality variant (for example the highest quality variant is "Premium" in English vs "de qualité" in French).
  • Then look up for Name using a regular expression that allows for characters afterwards. We need to be careful with that one as this can result in multiple matches that should all be processed.

By default, this output should also order voices based on:

  • quality, followed by voices who lack this criteria (which means implementing the method for ordering by quality)
  • the languages supported by the browser (navigator.languages, only taking into account the language part) should be listed above other languages (which means implementing the method for ordering by language as a secondary criteria)
  • and finally the preferred regions (still based on navigator.languages, only taking into account the region part) should be displayed above the other ones for each language, and use the default region if this information is missing (which means implementing the method for ordering by region as a tertiary criteria)

This way, the first languages listed by voices() should be a good default option since it will combine all key criteria. We should also provide a helper to return that voice (defaultVoice()?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In review
Development

No branches or pull requests

2 participants