Standardized output for `voices()` #2

HadrienGardeur · 2024-08-02T08:41:57Z

Once very low quality and novelty voices have been filtered out, the next step is to work on a standardized output for the voices() method:

label using the label documented in recommended voices with a fallback on the name returned by getVoices()
voiceURI using the same property returned by getVoices()
language using the lang value returned by getVoices()
gender as documented in recommended voices, this property is omitted if this information is missing
age as documented in recommended voices, this property is omitted if this information is missing
offlineAvailability using a boolean based on localService, as returned by getVoices()
quality based on the best available quality that can be detected, this property is omitted if this information is missing
pitchControl using a boolean which defaults to true if it's undocumented
recommendedPitch using the pitch value documented in recommended voices, this property is omitted if this information is missing
recommendedRate using the rate value documented in recommended voices, this property is omitted if this information is missing

In order to produce this output, we'll need to work with:

our list of available voices on the system, once very low quality and novelty voices have been filtered out
the full list of recommended voices
along with locales for Apple devices and Android

For most of the output, this is fairly straightforward with the value either coming from our filtered list of available voices on the system or the list of recommended voices. But there are a few tricks worth pointing out:

the value for language
and matching Apple and Android voices

As documented, there are inconsistencies in the values returned for language:

These inconsistencies need to be handled in our code to make sure that we always return a valid BCP-47 language tag instead.

Matching Apple and Android voices is also more difficult than it should be:

The usual approach for matching voices should be based simply on name.

For Android, if we detect the presence of localizedName with the value set to android then a special rule should be applied:

in addition to name, we should also match against the following pattern {language} {region} where {language} and {region} are localized based on the system settings
these locales are available as separate JSON files, using language we can identify the language and region and fetch the right values for all available locales

For Apple devices, this is even more complicated. Apple voices can have various quality variants, but we don't want to list two or three of them in our voices() output, which means that we need to keep the best one and drop the other variants from the list.

In order to do that, we can rely on:

quality to list all the variants potentially available
and the locales to know how these various quality variants are translated

Overall, we'll encounter three different cases for Apple voices:

Name
Name ({quality}) where quality is localized
Name ({language} ({region})) where language and region are localized

In order to detect the best voices we can follow an algorithm where the first match is added to our output and all subsequent matches are removed from the list:

Look up for the Name ({quality}) variant by checking for the highest quality variant first and then going down. Since we don't know the system language (this can't be predicted reliably based on languages returned by the browser), we must look for all known translation of a quality variant (for example the highest quality variant is "Premium" in English vs "de qualité" in French).
Then look up for Name using a regular expression that allows for characters afterwards. We need to be careful with that one as this can result in multiple matches that should all be processed.

By default, this output should also order voices based on:

quality, followed by voices who lack this criteria (which means implementing the method for ordering by quality)
the languages supported by the browser (navigator.languages, only taking into account the language part) should be listed above other languages (which means implementing the method for ordering by language as a secondary criteria)
and finally the preferred regions (still based on navigator.languages, only taking into account the region part) should be displayed above the other ones for each language, and use the default region if this information is missing (which means implementing the method for ordering by region as a tertiary criteria)

This way, the first languages listed by voices() should be a good default option since it will combine all key criteria. We should also provide a helper to return that voice (defaultVoice()?).

The text was updated successfully, but these errors were encountered:

HadrienGardeur added the voice-selection label Aug 2, 2024

HadrienGardeur assigned panaC Aug 2, 2024

HadrienGardeur mentioned this issue Aug 2, 2024

Standardized output for languages() #3

Open

HadrienGardeur added this to Readium Speech Aug 4, 2024

HadrienGardeur moved this to Todo in Readium Speech Aug 4, 2024

This was referenced Aug 6, 2024

New approach for localizing Apple and Android voice names HadrienGardeur/web-speech-recommended-voices#40

Merged

Sorting voices #6

Open

panaC mentioned this issue Aug 23, 2024

Initial support for listing voices and languages #7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardized output for `voices()` #2

Standardized output for `voices()` #2

HadrienGardeur commented Aug 2, 2024 •

edited

Loading

Standardized output for voices() #2

Standardized output for voices() #2

Comments

HadrienGardeur commented Aug 2, 2024 • edited Loading

Standardized output for `voices()` #2

Standardized output for `voices()` #2

HadrienGardeur commented Aug 2, 2024 •

edited

Loading