diff --git a/docs/guides/index.md b/docs/guides/index.md new file mode 100644 index 0000000000..7b9c1a2526 --- /dev/null +++ b/docs/guides/index.md @@ -0,0 +1,8 @@ +# User guides + +* [Extracting the content of a publication](content.md) +* [Supporting PDF documents](pdf.md) +* [Configuring the Navigator](navigator-preferences.md) +* [Font families in the EPUB navigator](epub-fonts.md) +* [Media Navigator](media-navigator.md) +* [Text-to-speech](tts.md) \ No newline at end of file diff --git a/docs/guides/media-navigator.md b/docs/guides/media-navigator.md new file mode 100644 index 0000000000..e259b5dddd --- /dev/null +++ b/docs/guides/media-navigator.md @@ -0,0 +1,212 @@ +# Media Navigator + +A `MediaNavigator` implementation can play media-based reading orders, such as audiobooks, text-to-speech rendition, and Media overlays. It enables you to reuse your UI, media controls, and logic related to media playback. + +## Controlling the playback + +A media navigator provides the API you need to pause or resume playback. + +```kotlin +navigator.pause() +check(!navigator.playback.value.playWhenReady) + +navigator.play() +check(navigator.playback.value.playWhenReady) +``` + +## Observing the playback changes + +You can observe the changes in the playback with the `navigator.playback` flow property. + +`playWhenReady` indicates whether the media is playing or will start playing once the required conditions are met (e.g. buffering). You will typically use this to change the icon of a play/pause button. + +The `state` property gives more information about the status of the playback: + +* `Ready` when the media is ready to be played if `playWhenReady` is true. +* `Ended` after reaching the end of the reading order items. +* `Buffering` if the navigator cannot play because the buffer is starved. +* `Error` occurs when an error preventing the playback happened. + +By combining the two, you can determine if the media is really playing: `playWhenReady && state == Ready`. + +Finally, you can use the `index` property to know which `navigator.readingOrder` item is set to be played. + +```kotlin +navigator.playback + .onEach { playback -> + playPauseButton.toggle(playback.playWhenReady) + + val playingItem = navigator.readingOrder.items[playback.index] + + if (playback.state is MediaNavigator.State.Error) { + // Alert + } + } + .launchIn(scope) +``` + +`MediaNavigator` implementations may provide additional playback properties. + +## Specializations of `MediaNavigator` + +### Audio Navigator + +The `AudioNavigator` interface is a specialized version of `MediaNavigator` for publications based on pre-recorded audio resources, such as audiobooks. It provides additional time-based APIs and properties. + +```kotlin +audioNavigator.playback + .onEach { playback -> + print("At duration ${playback.offset} in the resource, buffered ${playback.buffered}") + } + .launchIn(scope) + +// Jump to a particular duration offset in the resource item at index 4. +audioNavigator.seek(index = 4, offset = 5.seconds) +``` + +### Text-aware Media Navigator + +`TextAwareMediaNavigator` specializes `MediaNavigator` for media-based resources that are synchronized with text utterances, such as sentences. It offers additional APIs and properties to determine which utterances are playing. This interface is helpful for a text-to-speech or a Media overlays navigator. + +```kotlin +textAwareNavigator.playback + .onEach { playback -> + print("Playing the range ${playback.range} in text ${playback.utterance}") + } + .launchIn(scope) + +// Get additional context by observing the location instead of the playback. +textAwareNavigator.location + .onEach { location -> + // Highlight the portion of text being played. + visualNavigator.applyDecorations( + listOf(Decoration( + locator = location.utteranceLocator, + style = Decoration.Style.Highlight(tint = Color.RED) + )), + "highlight" + ) + } + .launchIn(scope) + +// Skip the current utterance. +if (textAwareNavigator.hasNextUtterance()) { + textAwareNavigator.goToNextUtterance() +} +``` + +## Background playback and media notification + +The Readium Kotlin toolkit provides implementations of `MediaNavigator` powered by Jetpack media3. This allows for continuous playback in the background and displaying Media-style notifications with playback controls. + +To accomplish this, you must create your own `MediaSessionService`. Get acquainted with [the concept behind media3](https://developer.android.com/guide/topics/media/media3) first. + +### Configuration + +Add the following [Jetpack media3](https://developer.android.com/jetpack/androidx/releases/media3) dependencies to your `build.gradle`, after checking for the latest version. + +```groovy +dependencies { + implementation "androidx.media3:media3-common:1.0.2" + implementation "androidx.media3:media3-session:1.0.2" + implementation "androidx.media3:media3-exoplayer:1.0.2" +} +``` + +### Add the `MediaSessionService` + +Create a new implementation of `MediaSessionService` in your application. For an example, take a look at `MediaService` in the Test App. You can access the media3 `Player` from the navigator with `navigator.asMedia3Player()`. + +Don't forget to declare this new service in your `AndroidManifest.xml`. + +```xml + + + + + + ... + + + + + + + + + + + + +``` + +### Customizing the notification metadata + +By default, the navigators will use the publication's metadata to display playback information in the Media-style notification. If you want to customize this, for example by retrieving metadata from your database, you can provide a custom `MediaMetadataFactory` implementation when creating the navigator. + +Here's an example for the `AndroidTtsNavigator`. + +```kotlin +val navigatorFactory = AndroidTtsNavigatorFactory( + application, publication, + metadataProvider = { pub -> + DatabaseMediaMetadataFactory( + context = application, + scope = application, + bookId = bookId, + trackCount = pub.readingOrder.size + ) + } +) + +/** + * Factory of media3 metadata for the local publication with given [bookId]. + */ +class DatabaseMediaMetadataFactory( + private val context: Context, + scope: CoroutineScope, + private val bookId: Int, + private val trackCount: Int +) : MediaMetadataFactory { + + private class Metadata( + val title: String, + val author: String, + val cover: ByteArray + ) + + private val metadata: Deferred = scope.async { + Database.getInstance(context).bookDao().get(bookId)?.let { book -> + Metadata( + title = book.title, + author = book.author, + // Byte arrays will go cross processes and should be kept small + cover = book.cover.scaleToFit(400, 400).toPng() + ) + } + } + + override suspend fun publicationMetadata(): MediaMetadata = + builder()?.build() ?: MediaMetadata.EMPTY + + override suspend fun resourceMetadata(index: Int): MediaMetadata = + builder()?.setTrackNumber(index)?.build() ?: MediaMetadata.EMPTY + + private suspend fun builder(): MediaMetadata.Builder? { + val metadata = metadata.await() ?: return null + + return MediaMetadata.Builder() + .setTitle(metadata.title) + .setTotalTrackCount(trackCount) + .setArtist(metadata.artist) + // We can't yet directly use a `content://` or `file://` URI with `setArtworkUri`. + // See https://github.com/androidx/media/issues/271 + .setArtworkData(metadata.cover, PICTURE_TYPE_FRONT_COVER) } + } +} +``` diff --git a/docs/guides/tts.md b/docs/guides/tts.md index 3b7b80b471..31b7a81928 100644 --- a/docs/guides/tts.md +++ b/docs/guides/tts.md @@ -1,182 +1,161 @@ # Text-to-speech -:warning: The API described in this guide will be changed in the next version of the Kotlin toolkit to support background TTS playback and media notifications. It is recommended that you wait before integrating it in your app. - -Text-to-speech can be used to read aloud a publication using a synthetic voice. The Readium toolkit ships with a TTS implementation based on the native [Android TTS engine](https://developer.android.com/reference/android/speech/tts/TextToSpeech), but it is opened for extension if you want to use a different TTS engine. +Text-to-speech can read aloud a publication using a synthetic voice. The Readium toolkit includes an implementation based on the [Android TTS engine](https://developer.android.com/reference/android/speech/tts/TextToSpeech), but it can be extended to use a different TTS engine. ## Glossary -* **engine** – a TTS engine takes an utterance and transforms it into audio using a synthetic voice -* **rate** - speech speed of a synthetic voice -* **tokenizer** - algorithm splitting the publication text content into individual utterances, usually by sentences * **utterance** - a single piece of text played by a TTS engine, such as a sentence -* **voice** – a synthetic voice is used by a TTS engine to speak a text using rules pertaining to the voice's language and region - -## Reading a publication aloud - -Apps targeting Android 11 that use text-to-speech should declare INTENT_ACTION_TTS_SERVICE in the queries elements of their manifest. - -To read a publication, you need to create an instance of `PublicationSpeechSynthesizer`. It orchestrates the rendition of a publication by iterating through its content, splitting it into individual utterances using a `ContentTokenizer`, then using a `TtsEngine` to read them aloud. Not all publications can be read using TTS, therefore the constructor returns a nullable object. You can also check whether a publication can be played beforehand using `PublicationSpeechSynthesizer.canSpeak(publication)`. +* **tokenizer** - algorithm splitting the publication text content into individual utterances, usually by sentences +* **engine** – a TTS engine takes an utterance and transforms it into audio using a synthetic voice +* **voice** – a synthetic voice is used by a TTS engine to speak a text in a way suitable for the language and region -```kotlin -val synthesizer = PublicationSpeechSynthesizer( - publication = publication, - config = PublicationSpeechSynthesizer.Configuration( - rateMultiplier = 1.25 - ), - listener = object : PublicationSpeechSynthesizer.Listener { ... } -) -``` +## Getting started -Then, begin the playback from a given starting `Locator`. When missing, the playback will start from the beginning of the publication. +:warning: Apps targeting Android 11 that use the native text-to-speech must declare `INTENT_ACTION_TTS_SERVICE` in the queries elements of their manifest. -```kotlin -synthesizer.start() +```xml + + + + + ``` -You should now hear the TTS engine speak the utterances from the beginning. `PublicationSpeechSynthesizer` provides the APIs necessary to control the playback from the app: - -* `stop()` - stops the playback ; requires start to be called again -* `pause()` - interrupts the playback temporarily -* `resume()` - resumes the playback where it was paused -* `pauseOrResume()` - toggles the pause -* `previous()` - skips to the previous utterance -* `next()` - skips to the next utterance - -Look at `TtsControls` in the Test App for an example of a view calling these APIs. - -:warning: Once you are done with the synthesizer, you should call `close()` to release held resources. +The text-to-speech feature is implemented as a standalone `Navigator`, which can render any publication with a [Content Service](content.md), such as an EPUB. This means you don't need an `EpubNavigatorFragment` open to read the publication; you can use the TTS navigator in the background. -## Observing the playback state +To get a new instance of `TtsNavigator`, first create an `AndroidTtsNavigatorFactory` to use the default Android TTS engine. -The `PublicationSpeechSynthesizer` should be the single source of truth to represent the playback state in your user interface. You can observe the `synthesizer.state` property to keep your user interface synchronized with the playback. The possible states are: - -* `Stopped` when idle and waiting for a call to `start()`. -* `Paused(utterance: Utterance)` when interrupted while playing `utterance`. -* `Playing(utterance: Utterance, range: Locator?)` when speaking `utterance`. This state is updated repeatedly while the utterance is spoken, updating the `range` property with the portion of utterance being played (usually the current word). +```kotlin +val factory = AndroidTtsNavigatorFactory(application, publication) + ?: throw Exception("This publication cannot be played with the TTS navigator") -When pairing the `PublicationSpeechSynthesizer` with a `Navigator`, you can use the `utterance.locator` and `range` properties to highlight spoken utterances and turn pages automatically. +val navigator = factory.createNavigator() +navigator.play() +``` -## Configuring the TTS +`TtsNavigator` implements `MediaNavigator`, so you can use all the APIs available for media-based playback. Check out the [dedicated user guide](media-navigator.md) to learn how to control `TtsNavigator` and observe playback notifications. -The `PublicationSpeechSynthesizer` offers some options to configure the TTS engine. Note that the support of each configuration option depends on the TTS engine used. +## Configuring the Android TTS navigator -Update the configuration by setting it directly. The configuration is not applied right away but for the next utterance. +The `AndroidTtsNavigator` implements [`Configurable`](navigator-preferences.md) and provides various settings to customize the text-to-speech experience. ```kotlin -synthesizer.setConfig(synthesizer.config.copy( - defaultLanguage = Language(Locale.FRENCH) +navigator.submitPreferences(AndroidTtsPreferences( + language = Language("fr"), + pitch = 0.8f, + speed = 1.5f )) ``` -To keep your settings user interface up to date when the configuration changes, observe the `PublicationSpeechSynthesizer.config` property. Look at `TtsControls` in the Test App for an example of a TTS settings screen. +A `PreferencesEditor` is available to help you construct your user interface and modify the preferences. -### Default language +```kotlin +val factory = AndroidTtsNavigatorFactory(application, publication) + ?: throw Exception("This publication cannot be played with the TTS navigator") -The language used by the synthesizer is important, as it determines which TTS voices are used and the rules to tokenize the publication text content. +val navigator = factory.createNavigator() -By default, `PublicationSpeechSynthesizer` will use any language explicitly set on a text element (e.g. with `lang="fr"` in HTML) and fall back on the global language declared in the publication manifest. You can override the fallback language with `Configuration.defaultLanguage` which is useful when the publication language is incorrect or missing. +val editor = factory.createPreferencesEditor(preferences) +editor.pitch.increment() +navigator.submitPreferences(editor.preferences) +``` -### Speech rate +### Language preference -The `rateMultiplier` configuration sets the speech speed as a multiplier, 1.0 being the normal speed. The available range depends on the TTS engine and can be queried with `synthesizer.rateMultiplierRange`. +The language set in the preferences determines the default voice used and how the publication text content is tokenized – i.e. split in utterances. -```kotlin -PublicationSpeechSynthesizer.Configuration( - rateMultiplier = multiplier.coerceIn(synthesizer.rateMultiplierRange) -) -``` +By default, the TTS navigator uses any language explicitly set on a text element (e.g. `lang="fr"` in HTML) and, if none is set, it falls back on the language declared in the publication manifest. Providing an explicit language preference is useful when the publication language is incorrect or missing. -### Voice +### Voices preference -The `voice` setting can be used to change the synthetic voice used by the engine. To get the available list, use `synthesizer.availableVoices`. Note that the available voices can change during runtime, observe `availableVoices` to keep your interface up to date. +The Android TTS engine supports multiple voices. To allow users to choose their preferred voice for each language, they are stored as a dictionary `Map` in `AndroidTtsPreferences`. -To restore a user-selected voice, persist the unique voice identifier returned by `voice.id`. +Use the `voices` property of the `AndroidTtsNavigator` instance to get the full list of available voices. -Users do not expect to see all available voices at all time, as they depend on the selected language. You can group the voices by their language and filter them by the selected language using the following snippet. +Users don't expect to see all available voices at once, as they depend on the selected language. To get an `EnumPreference` based on the current `language` preference, you can use the following snippet. ```kotlin -// Supported voices grouped by their language. -val voicesByLanguage: Flow>> = - synthesizer.availableVoices - .map { voices -> voices.groupBy { it.language } } - -// Supported voices for the language selected in the configuration. -val voicesForSelectedLanguage: Flow> = - combine( - synthesizer.config.map { it.defaultLanguage }, - voicesByLanguage, - ) { language, voices -> - language - ?.let { voices[it] } - ?.sortedBy { it.name ?: it.id } - ?: emptyList() +// We remove the region to show all the voices for a given language, no matter the region (e.g. Canada, France). +val currentLanguage = editor.language.effectiveValue?.removeRegion() + +val voice: EnumPreference = editor.voices + .map( + from = { voices -> + currentLanguage?.let { voices[it] } + }, + to = { voice -> + currentLanguage + ?.let { editor.voices.value.orEmpty().update(it, voice) } + ?: editor.voices.value.orEmpty() + } + ) + .withSupportedValues( + navigator.voices + .filter { it.language.removeRegion() == currentLanguage } + .map { it.id } + ) + +fun Map.update(key: K, value: V?): Map = + buildMap { + putAll(this@update) + if (value == null) { + remove(key) + } else { + put(key, value) + } } ``` -## Installing missing voice data +#### Installing missing voice data :point_up: This only applies if you use the default `AndroidTtsEngine`. -Sometimes the device does not have access to all the data required by a selected voice, in which case the user needs to download it manually. You can catch the `TtsEngine.Exception.LanguageSupportIncomplete` error and call `synthesizer.engine.requestInstallMissingVoice()` to start the system voice download activity. +If the device lacks the data necessary for the chosen voice, the user needs to manually download it. To do so, call the `AndroidTtsEngine.requestInstallVoice()` helper when the `AndroidTtsEngine.Error.LanguageMissingData` error occurs. This will launch the system voice download activity. ```kotlin -val synthesizer = PublicationSpeechSynthesizer(context, publication) - -synthesizer.listener = object : PublicationSpeechSynthesizer.Listener { - override fun onUtteranceError( utterance: PublicationSpeechSynthesizer.Utterance, error: PublicationSpeechSynthesizer.Exception) { - handle(error) - } - - override fun onError(error: PublicationSpeechSynthesizer.Exception) { - handle(error) - } - - private fun handle(error: PublicationSpeechSynthesizer.Exception) { - when (error) { - is PublicationSpeechSynthesizer.Exception.Engine -> - when (val err = error.error) { - is TtsEngine.Exception.LanguageSupportIncomplete -> { - synthesizer.engine.requestInstallMissingVoice(context) - } - - else -> { - ... - } - } +navigator.playback + .onEach { playback -> + (playback?.state as? TtsNavigator.State.Error.EngineError<*>) + ?.let { it.error as? AndroidTtsEngine.Error.LanguageMissingData } + ?.let { error -> + Timber.e("Missing data for language ${error.language}") + AndroidTtsEngine.requestInstallVoice(context) } - } } + .launchIn(viewModelScope) ``` -## Synchronizing the TTS with a Navigator +## Synchronizing the TTS navigator with a visual navigator + +`TtsNavigator` is a standalone navigator that can be used to play a publication in the background. However, most apps prefer to display the publication while it is being read aloud. To do this, you can open the publication with a visual navigator (e.g. `EpubNavigatorFragment`) alongside the `TtsNavigator`. Then, synchronize the progression between the two navigators and use the Decorator API to highlight the spoken utterances. -While `PublicationSpeechSynthesizer` is completely independent from `Navigator` and can be used to play a publication in the background, most apps prefer to render the publication while it is being read aloud. The `Locator` core model is used as a means to synchronize the synthesizer with the navigator. +For concrete examples, take a look at `TtsViewModel` in the Test App. ### Starting the TTS from the visible page -`PublicationSpeechSynthesizer.start()` takes a starting `Locator` for parameter. You can use it to begin the playback from the currently visible page in a `VisualNavigator` using `firstVisibleElementLocator()`. +To start the TTS from the currently visible page, you can use the `VisualNavigator.firstVisibleElementLocator()` API to feed the initial locator of the `TtsNavigator`. ```kotlin -val start = (navigator as? VisualNavigator)?.firstVisibleElementLocator() -synthesizer.start(fromLocator = start) +val ttsNavigator = ttsNavigatorFactory.createNavigator( + initialLocator = (navigator as? VisualNavigator)?.firstVisibleElementLocator() +) ``` ### Highlighting the currently spoken utterance -If you want to highlight or underline the current utterance on the page, you can apply a `Decoration` on the utterance locator with a `DecorableNavigator`. +To highlight the current utterance on the page, you can apply a `Decoration` on the utterance locator if the visual navigator implements `DecorableNavigator`. ```kotlin -val navigator: DecorableNavigator +val visualNavigator: DecorableNavigator -synthesizer.state - .map { (it as? State.Playing)?.utterance } +ttsNavigator.location + .map { it.utteranceLocator } .distinctUntilChanged() - .onEach { utterance -> + .onEach { locator -> navigator.applyDecorations(listOf( Decoration( id = "tts-utterance", - locator = utterance.locator, + locator = locator, style = Decoration.Style.Highlight(tint = Color.RED) ) ), group = "tts") @@ -186,47 +165,48 @@ synthesizer.state ### Turning pages automatically -You can use the same technique as described above to automatically synchronize the `Navigator` with the played utterance, using `navigator.go(utterance.locator)`. +To keep the visual navigator in sync with the utterance being played, observe the navigator's current `location` as described above and use `navigator.go(location.utteranceLocator)`. + +However, this won't turn pages in the middle of an utterance, which can be irritating when speaking a lengthy sentence that spans two pages. To tackle this issue, you can use `location.tokenLocator` when available. It is updated constantly while you speak each word of an utterance. + +Jumping to the token locator for every word can significantly reduce performance. To address this, it is recommended to use [`throttleLatest`](https://github.com/Kotlin/kotlinx.coroutines/issues/1107#issuecomment-1083076517). -However, this will not turn pages mid-utterance, which can be annoying when speaking a long sentence spanning two pages. To address this, you can go to the `State.Playing.range` locator instead, which is updated regularly while speaking each word of an utterance. Note that jumping to the `range` locator for every word can severely impact performances. To alleviate this, you can throttle the flow using [`throttleLatest`](https://github.com/Kotlin/kotlinx.coroutines/issues/1107#issuecomment-1083076517). ```kotlin -synthesizer.state - .filterIsInstance() - .map { it.range ?: it.utterance.locator } +ttsNavigator.location .throttleLatest(1.seconds) + .map { it.tokenLocator ?: it.utteranceLocator } + .distinctUntilChanged() .onEach { locator -> navigator.go(locator, animated = false) } .launchIn(scope) ``` -## Using a custom utterance tokenizer +## Advanced customizations + +### Utterance tokenizer -By default, the `PublicationSpeechSynthesizer` will split the publication text into sentences to create the utterances. You can customize this for finer or coarser utterances using a different tokenizer. +By default, the `TtsNavigator` splits the publication text into sentences, but you can supply your own tokenizer to customize how the text is divided. -For example, this will speak the content word-by-word: +For example, this will speak the content word by word: ```kotlin -val synthesizer = PublicationSpeechSynthesizer(context, publication, +val navigatorFactory = TtsNavigatorFactory( + application, publication, tokenizerFactory = { language -> - TextContentTokenizer( - defaultLanguage = language, - unit = TextUnit.Word - ) + DefaultTextContentTokenizer(unit = TextUnit.Word, language = language) } ) ``` -For completely custom tokenizing or to improve the existing tokenizers, you can implement your own `ContentTokenizer`. +### Custom TTS engine -## Using a custom TTS engine - -`PublicationSpeechSynthesizer` can be used with any TTS engine, provided they implement the `TtsEngine` interface. Take a look at `AndroidTtsEngine` for an example implementation. +`TtsNavigator` is compatible with any TTS engine if you provide an adapter implementing the `TtsEngine` interface. For an example, take a look at `AndroidTtsEngine`. ```kotlin -val synthesizer = PublicationSpeechSynthesizer(publication, - engineFactory = { listener -> MyCustomEngine(listener) } +val navigatorFactory = TtsNavigatorFactory( + application, publication, + engineProvider = MyEngineProvider() ) ``` - diff --git a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/api/TextAwareMediaNavigator.kt b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/api/TextAwareMediaNavigator.kt index d76d3b5d31..b095743d7c 100644 --- a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/api/TextAwareMediaNavigator.kt +++ b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/api/TextAwareMediaNavigator.kt @@ -108,14 +108,14 @@ interface TextAwareMediaNavigator< * * Does nothing if the current utterance is the first one. */ - fun previousUtterance() + fun goToPreviousUtterance() /** * Jumps to the next utterance. * * Does nothing if the current utterance is the last one. */ - fun nextUtterance() + fun goToNextUtterance() /** * Whether the current utterance has a previous one or is the first one. diff --git a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/syncmedia/GuidedAudioNavigator.kt b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/syncmedia/GuidedAudioNavigator.kt index 9563d835d3..ea66c750f9 100644 --- a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/syncmedia/GuidedAudioNavigator.kt +++ b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/syncmedia/GuidedAudioNavigator.kt @@ -102,11 +102,11 @@ class GuidedAudioNavigator, player.go(locator) } - override fun previousUtterance() { + override fun goToPreviousUtterance() { player.previousUtterance() } - override fun nextUtterance() { + override fun goToNextUtterance() { player.nextUtterance() } diff --git a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/TtsNavigatorFactory.kt b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/TtsNavigatorFactory.kt index 48201ee251..8cc477d0b8 100644 --- a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/TtsNavigatorFactory.kt +++ b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/TtsNavigatorFactory.kt @@ -40,15 +40,13 @@ class TtsNavigatorFactory, tokenizerFactory: (language: Language?) -> TextTokenizer = defaultTokenizerFactory, metadataProvider: MediaMetadataProvider = defaultMediaMetadataProvider, defaults: AndroidTtsDefaults = AndroidTtsDefaults(), - voiceSelector: (Language?, Set) -> AndroidTtsEngine.Voice? = defaultVoiceSelector, - listener: AndroidTtsEngine.Listener? = null + voiceSelector: (Language?, Set) -> AndroidTtsEngine.Voice? = defaultVoiceSelector ): AndroidTtsNavigatorFactory? { val engineProvider = AndroidTtsEngineProvider( context = application, defaults = defaults, - voiceSelector = voiceSelector, - listener = listener + voiceSelector = voiceSelector ) return createNavigatorFactory( @@ -132,7 +130,6 @@ class TtsNavigatorFactory, ) } - fun createTtsPreferencesEditor( - currentPreferences: P, - ): E = ttsEngineProvider.createPreferencesEditor(publication, currentPreferences) + fun createPreferencesEditor(preferences: P): E = + ttsEngineProvider.createPreferencesEditor(publication, preferences) } diff --git a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngine.kt b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngine.kt index 244a7e8c08..26747f1500 100644 --- a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngine.kt +++ b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngine.kt @@ -41,7 +41,6 @@ class AndroidTtsEngine private constructor( engine: TextToSpeech, private val settingsResolver: SettingsResolver, private val voiceSelector: VoiceSelector, - private val listener: Listener?, override val voices: Set, initialPreferences: AndroidTtsPreferences ) : TtsEngine): Voice? } - class Error(code: Int) : TtsEngine.Error { + sealed class Error : TtsEngine.Error { - val kind: Kind = - Kind.getOrDefault(code) + /** Denotes a generic operation failure. */ + object Unknown : Error() + + /** Denotes a failure caused by an invalid request. */ + object InvalidRequest : Error() + + /** Denotes a failure caused by a network connectivity problems. */ + object Network : Error() + + /** Denotes a failure caused by network timeout. */ + object NetworkTimeout : Error() + + /** Denotes a failure caused by an unfinished download of the voice data. */ + object NotInstalledYet : Error() + + /** Denotes a failure related to the output (audio device or a file). */ + object Output : Error() + + /** Denotes a failure of a TTS service. */ + object Service : Error() + + /** Denotes a failure of a TTS engine to synthesize the given input. */ + object Synthesis : Error() + + /** + * Denotes the language data is missing. + * + * You can open the Android settings to install the missing data with: + * AndroidTtsEngine.requestInstallVoice(context) + */ + data class LanguageMissingData(val language: Language) : Error() /** * Android's TTS error code. * See https://developer.android.com/reference/android/speech/tts/TextToSpeech#ERROR */ - enum class Kind(val code: Int) { - /** Denotes a generic operation failure. */ - Unknown(-1), - /** Denotes a failure caused by an invalid request. */ - InvalidRequest(-8), - /** Denotes a failure caused by a network connectivity problems. */ - Network(-6), - /** Denotes a failure caused by network timeout. */ - NetworkTimeout(-7), - /** Denotes a failure caused by an unfinished download of the voice data. */ - NotInstalledYet(-9), - /** Denotes a failure related to the output (audio device or a file). */ - Output(-5), - /** Denotes a failure of a TTS service. */ - Service(-4), - /** Denotes a failure of a TTS engine to synthesize the given input. */ - Synthesis(-3); - - companion object { - - fun getOrDefault(key: Int): Kind = - values() - .firstOrNull { it.code == key } - ?: Unknown - } + companion object { + internal fun fromNativeError(code: Int): Error = + when (code) { + ERROR_INVALID_REQUEST -> InvalidRequest + ERROR_NETWORK -> Network + ERROR_NETWORK_TIMEOUT -> NetworkTimeout + ERROR_NOT_INSTALLED_YET -> NotInstalledYet + ERROR_OUTPUT -> Output + ERROR_SERVICE -> Service + ERROR_SYNTHESIS -> Synthesis + else -> Unknown + } } } @@ -205,13 +218,6 @@ class AndroidTtsEngine private constructor( } } - interface Listener { - - fun onMissingData(language: Language) - - fun onLanguageNotSupported(language: Language) - } - private data class Request( val id: TtsEngine.RequestId, val text: String, @@ -283,8 +289,7 @@ class AndroidTtsEngine private constructor( tryReconnect(request) } is State.EngineAvailable -> { - val result = doSpeak(stateNow.engine, request) - if (result == ERROR) { + if (!doSpeak(stateNow.engine, request)) { cleanEngine(stateNow.engine) tryReconnect(request) } @@ -333,9 +338,9 @@ class AndroidTtsEngine private constructor( private fun doSpeak( engine: TextToSpeech, request: Request - ): Int { - engine.setupVoice(settings.value, request.language, voices) - return engine.speak(request.text, QUEUE_ADD, null, request.id.value) + ): Boolean { + return engine.setupVoice(settings.value, request.id, request.language, voices) + && (engine.speak(request.text, QUEUE_ADD, null, request.id.value) == SUCCESS) } private fun setupListener(engine: TextToSpeech) { @@ -362,7 +367,7 @@ class AndroidTtsEngine private constructor( private fun onReconnectionFailed() { val previousState = state as State.WaitingForService - val error = Error(Error.Kind.Service.code) + val error = Error.Service state = State.Error(error) for (request in previousState.pendingRequests) { @@ -391,16 +396,23 @@ class AndroidTtsEngine private constructor( private fun TextToSpeech.setupVoice( settings: AndroidTtsSettings, + id: TtsEngine.RequestId, utteranceLanguage: Language?, voices: Set - ) { - val language = utteranceLanguage + ): Boolean { + var language = utteranceLanguage .takeUnless { settings.overrideContentLanguage } ?: settings.language + utteranceListener?.onError(id, Error.LanguageMissingData(language)) + return false + when (isLanguageAvailable(language.locale)) { - LANG_MISSING_DATA -> listener?.onMissingData(language) - LANG_NOT_SUPPORTED -> listener?.onLanguageNotSupported(language) + LANG_MISSING_DATA -> { + utteranceListener?.onError(id, Error.LanguageMissingData(language)) + return false + } + LANG_NOT_SUPPORTED -> language = Language(defaultVoice.locale) } val preferredVoiceWithRegion = @@ -422,12 +434,14 @@ class AndroidTtsEngine private constructor( voice ?.let { this.voice = it } ?: run { this.language = language.locale } + + return true } private fun TextToSpeech.voiceForName(name: String) = voices.firstOrNull { it.name == name } - class UtteranceListener( + private class UtteranceListener( private val listener: TtsEngine.Listener? ) : UtteranceProgressListener() { override fun onStart(utteranceId: String) { @@ -457,7 +471,7 @@ class AndroidTtsEngine private constructor( override fun onError(utteranceId: String, errorCode: Int) { listener?.onError( TtsEngine.RequestId(utteranceId), - Error(errorCode) + Error.fromNativeError(errorCode) ) } diff --git a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngineProvider.kt b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngineProvider.kt index 6b27277e1a..cc01c650e6 100644 --- a/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngineProvider.kt +++ b/readium/navigator/src/main/java/org/readium/r2/navigator/media3/tts/android/AndroidTtsEngineProvider.kt @@ -20,7 +20,6 @@ import org.readium.r2.shared.publication.Publication class AndroidTtsEngineProvider( private val context: Context, private val defaults: AndroidTtsDefaults = AndroidTtsDefaults(), - private val listener: AndroidTtsEngine.Listener? = null, private val voiceSelector: AndroidTtsEngine.VoiceSelector = AndroidTtsEngine.VoiceSelector { _, _ -> null } ) : TtsEngineProvider { @@ -36,7 +35,6 @@ class AndroidTtsEngineProvider( context, settingsResolver, voiceSelector, - listener, initialPreferences ) } @@ -73,26 +71,24 @@ class AndroidTtsEngineProvider( } override fun mapEngineError(error: AndroidTtsEngine.Error): PlaybackException { - val errorCode = when (error.kind) { - AndroidTtsEngine.Error.Kind.Unknown -> + val errorCode = when (error) { + AndroidTtsEngine.Error.Unknown -> ERROR_CODE_UNSPECIFIED - AndroidTtsEngine.Error.Kind.InvalidRequest -> + AndroidTtsEngine.Error.InvalidRequest -> ERROR_CODE_IO_BAD_HTTP_STATUS - AndroidTtsEngine.Error.Kind.Network -> + AndroidTtsEngine.Error.Network -> ERROR_CODE_IO_NETWORK_CONNECTION_FAILED - AndroidTtsEngine.Error.Kind.NetworkTimeout -> + AndroidTtsEngine.Error.NetworkTimeout -> ERROR_CODE_IO_NETWORK_CONNECTION_TIMEOUT - AndroidTtsEngine.Error.Kind.NotInstalledYet -> - ERROR_CODE_UNSPECIFIED - AndroidTtsEngine.Error.Kind.Output -> - ERROR_CODE_UNSPECIFIED - AndroidTtsEngine.Error.Kind.Service -> - ERROR_CODE_UNSPECIFIED - AndroidTtsEngine.Error.Kind.Synthesis -> + AndroidTtsEngine.Error.Output, + AndroidTtsEngine.Error.Service, + AndroidTtsEngine.Error.Synthesis, + is AndroidTtsEngine.Error.LanguageMissingData, + AndroidTtsEngine.Error.NotInstalledYet -> ERROR_CODE_UNSPECIFIED } - val message = "Android TTS engine error: ${error.kind.code}" + val message = "Android TTS engine error: ${error.javaClass.simpleName}" return PlaybackException(message, null, errorCode) } diff --git a/readium/shared/src/main/java/org/readium/r2/shared/publication/services/content/iterators/HtmlResourceContentIterator.kt b/readium/shared/src/main/java/org/readium/r2/shared/publication/services/content/iterators/HtmlResourceContentIterator.kt index 7590eadb7c..3dd2a118bc 100644 --- a/readium/shared/src/main/java/org/readium/r2/shared/publication/services/content/iterators/HtmlResourceContentIterator.kt +++ b/readium/shared/src/main/java/org/readium/r2/shared/publication/services/content/iterators/HtmlResourceContentIterator.kt @@ -319,8 +319,9 @@ class HtmlResourceContentIterator internal constructor( currentLanguage = language } - rawTextAcc += Parser.unescapeEntities(node.wholeText, false) - appendNormalisedText(node) + val text = Parser.unescapeEntities(node.wholeText, false) + rawTextAcc += text + appendNormalisedText(text) } else if (node is Element) { if (node.isBlock) { assert(breadcrumbs.last() == node) @@ -330,8 +331,7 @@ class HtmlResourceContentIterator internal constructor( } } - private fun appendNormalisedText(textNode: TextNode) { - val text = Parser.unescapeEntities(textNode.wholeText, false) + private fun appendNormalisedText(text: String) { StringUtil.appendNormalisedWhitespace(textAcc, text, lastCharIsWhitespace()) } diff --git a/test-app/src/main/java/org/readium/r2/testapp/reader/ReaderRepository.kt b/test-app/src/main/java/org/readium/r2/testapp/reader/ReaderRepository.kt index fc77b39293..8216a5b201 100644 --- a/test-app/src/main/java/org/readium/r2/testapp/reader/ReaderRepository.kt +++ b/test-app/src/main/java/org/readium/r2/testapp/reader/ReaderRepository.kt @@ -183,7 +183,10 @@ class ReaderRepository( ): TtsInitData? { val preferencesManager = AndroidTtsPreferencesManagerFactory(preferencesDataStore) .createPreferenceManager(bookId) - val navigatorFactory = TtsNavigatorFactory(application, publication) ?: return null + val navigatorFactory = TtsNavigatorFactory( + application, + publication + ) ?: return null return TtsInitData(mediaServiceFacade, navigatorFactory, preferencesManager) } diff --git a/test-app/src/main/java/org/readium/r2/testapp/reader/VisualReaderFragment.kt b/test-app/src/main/java/org/readium/r2/testapp/reader/VisualReaderFragment.kt index b88905de3c..ab47c3da90 100644 --- a/test-app/src/main/java/org/readium/r2/testapp/reader/VisualReaderFragment.kt +++ b/test-app/src/main/java/org/readium/r2/testapp/reader/VisualReaderFragment.kt @@ -83,13 +83,6 @@ abstract class VisualReaderFragment : BaseReaderFragment(), VisualNavigator.List */ private var disableTouches by mutableStateOf(false) - /** - * When true, the fragment won't save progression. - * This is useful in the case where the TTS is on and a service is saving progression - * in background. - */ - private var preventProgressionSaving: Boolean = false - override fun onViewCreated(view: View, savedInstanceState: Bundle?) { super.onViewCreated(view, savedInstanceState) @@ -148,11 +141,7 @@ abstract class VisualReaderFragment : BaseReaderFragment(), VisualNavigator.List viewLifecycleOwner.lifecycleScope.launch { viewLifecycleOwner.repeatOnLifecycle(Lifecycle.State.STARTED) { navigator.currentLocator - .onEach { - if (!preventProgressionSaving) { - model.saveProgression(it) - } - } + .onEach { model.saveProgression(it) } .launchIn(this) setupHighlights(this) @@ -231,12 +220,6 @@ abstract class VisualReaderFragment : BaseReaderFragment(), VisualNavigator.List } .launchIn(scope) } - - showControls - .onEach { showControls -> - preventProgressionSaving = showControls - } - .launchIn(scope) } } diff --git a/test-app/src/main/java/org/readium/r2/testapp/reader/tts/TtsViewModel.kt b/test-app/src/main/java/org/readium/r2/testapp/reader/tts/TtsViewModel.kt index 5d130ab097..34d3db97c7 100644 --- a/test-app/src/main/java/org/readium/r2/testapp/reader/tts/TtsViewModel.kt +++ b/test-app/src/main/java/org/readium/r2/testapp/reader/tts/TtsViewModel.kt @@ -105,7 +105,7 @@ class TtsViewModel private constructor( bookId = bookId, preferencesManager = preferencesManager ) { preferences -> - val baseEditor = ttsNavigatorFactory.createTtsPreferencesEditor(preferences) + val baseEditor = ttsNavigatorFactory.createPreferencesEditor(preferences) val voices = navigatorNow?.voices.orEmpty() TtsPreferencesEditor(baseEditor, voices) } @@ -148,7 +148,8 @@ class TtsViewModel private constructor( is MediaNavigator.State.Ready -> {} is MediaNavigator.State.Buffering -> {} } - }.launchIn(viewModelScope) + } + .launchIn(viewModelScope) preferencesManager.preferences .onEach { navigatorNow?.submitPreferences(it) } @@ -213,24 +214,26 @@ class TtsViewModel private constructor( } private fun onPlaybackError(error: TtsNavigator.State.Error) { - val exception = when (error) { + val event = when (error) { is TtsNavigator.State.Error.ContentError -> { Timber.e(error.exception) - UserException(R.string.tts_error_other, cause = error.exception) + Event.OnError(UserException(R.string.tts_error_other, cause = error.exception)) } is TtsNavigator.State.Error.EngineError<*> -> { - val kind = (error.error as AndroidTtsEngine.Error).kind - when (kind) { - AndroidTtsEngine.Error.Kind.Network -> - UserException(R.string.tts_error_network) + val engineError = (error.error as AndroidTtsEngine.Error) + when (engineError) { + is AndroidTtsEngine.Error.LanguageMissingData -> + Event.OnMissingVoiceData(engineError.language) + AndroidTtsEngine.Error.Network -> + Event.OnError(UserException(R.string.tts_error_network)) else -> - UserException(R.string.tts_error_other) - }.also { Timber.e(it, "Error type: ${kind.name}") } + Event.OnError(UserException(R.string.tts_error_other)) + }.also { Timber.e("Error type: $error") } } } viewModelScope.launch { - _events.send(Event.OnError(exception)) + _events.send(event) } } } diff --git a/test-app/src/main/res/values/strings.xml b/test-app/src/main/res/values/strings.xml index db8d60656a..692f8e8f39 100644 --- a/test-app/src/main/res/values/strings.xml +++ b/test-app/src/main/res/values/strings.xml @@ -200,7 +200,6 @@ Language Failed to initialize the TTS engine - The language %s is not supported The language %s requires additional data. Do you want to download it? A networking error occurred A TTS error occurred