Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user normalization dictionary #165

Merged
merged 3 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 31 additions & 13 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,51 @@
# Símarómur

This project provides an Icelandic TTS application for the Android TTS service.
This project provides an Icelandic TTS application for the Android TTS service. The current state
of the project is *production-ready*.

The app is available on the [Google Play Store](https://play.google.com/store/apps/details?id=com.grammatek.simaromur).

## Voices

Símarómur provides access to neural network [on-device voices](https://github.com/grammatek/simaromur_voices)
that are bundled via assets.

Handling of device-local voices started originally based on [Flite TTS Engine For Android](https://github.com/happyalu/Flite-TTS-Engine-for-Android),
but we changed the code considerably and started with a clean slate instead of forking the project.
We replaced many deprecated API's and also use current TTS Service Android API's. We also use CMake
for integrating the C++ part instead of ndk-build and adapted the JNI part to be compatible with 64Bit
platforms.
Currently, there is one male voice available, named **Steinn**. This voice is not only highly intelligible
but also possesses a pleasant and engaging tone, making it a versatile, general-purpose option that
sets the standard for Icelandic on-device text-to-speech (TTS) technology. It is well-suited for
reading both short and lengthy texts, providing a consistent listening experience.

### New since version 2.x
Deprecated FLite voices and the former neural network voices. Nowadays, Flite voices are obsolete
and we are using purely neural network voices instead. The FLite project is barely maintained, and
the runtime performance of the neural network voices is closing in on the FLite voices rapidly.
We can achieve 25x realtime speed with the neural network voices on a Pixel 6 phone.
We are currently developing a multi-speaker model that will include a female voice, slated for
future release.

The neural network model is based on [VITS](https://github.com/jaywalnut310/vits) and trained via
[Piper TTS](https://github.com/rhasspy/piper)
## User Normalization Dictionary

Users can add normalization entries to accommodate alternative pronunciations of words or tokens.
These alternative pronunciations take precedence over the built-in normalization rules, applying
the specified replacements for any such terms found in the text being read.

To simplify usage, replacements can be made at the grapheme level without the need to understand or
use regular expression syntax. Users can immediately hear how the entered term and its replacement
sound with the current voice by using play buttons.

By default, the user normalization dictionary starts empty. At present, importing or exporting the
dictionary is not supported.

## Text Normalization & G2P

Icelandic text normalization is performed before the text enters G2P.
Local voice G2P is [rule-based](https://github.com/grammatek/g2p-thrax) and is implemented using the C++
frameworks Thrax & OpenFST, which are accessed via JNI.

## New since version 2.x
Deprecated FLite voices and the former neural network voices. Nowadays, Flite voices are obsolete
and we are using purely neural network voices instead. The FLite project is barely maintained, and
the runtime performance of the neural network voices is closing in on the FLite voices rapidly.
We can achieve 25x realtime speed with the neural network model on a Pixel 6 phone.

The neural network model is based on [VITS](https://github.com/jaywalnut310/vits) and trained via
[Piper TTS](https://github.com/rhasspy/piper).

## Build Prerequisites

This project uses our versions of [OpenFST](https://github.com/grammatek/openfst) &
Expand Down
16 changes: 16 additions & 0 deletions app/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ dependencies {
testImplementation 'com.google.truth:truth:1.1.3'

// implementation 'androidx.appcompat:appcompat:1.3.0'
implementation 'androidx.recyclerview:recyclerview:1.3.2'
implementation 'com.google.android.material:material:1.11.0'
implementation 'androidx.constraintlayout:constraintlayout:2.1.4'
testImplementation 'org.robolectric:robolectric:4.9'
Expand All @@ -242,9 +243,24 @@ dependencies {
implementation "androidx.room:room-runtime:$room_version"
annotationProcessor "androidx.room:room-compiler:$room_version"

// optional - Kotlin Extensions and Coroutines support for Room
implementation("androidx.room:room-ktx:$room_version")

// optional - RxJava2 support for Room
implementation("androidx.room:room-rxjava2:$room_version")

// optional - RxJava3 support for Room
implementation("androidx.room:room-rxjava3:$room_version")

// optional - Guava support for Room, including Optional and ListenableFuture
implementation("androidx.room:room-guava:$room_version")

// optional - Test helpers
testImplementation "androidx.room:room-testing:$room_version"

// optional - Paging 3 Integration
implementation("androidx.room:room-paging:$room_version")

// retrofit
implementation 'com.google.code.gson:gson:2.10'
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
Expand Down
225 changes: 225 additions & 0 deletions app/schemas/com.grammatek.simaromur.db.ApplicationDb/9.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
{
"formatVersion": 1,
"database": {
"version": 9,
"identityHash": "37c14f434910e65cf2a913a3a95eb959",
"entities": [
{
"tableName": "voice_table",
"createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`voiceId` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, `name` TEXT NOT NULL, `gender` TEXT NOT NULL, `internal_name` TEXT NOT NULL, `language_code` TEXT NOT NULL, `language_name` TEXT NOT NULL, `variant` TEXT NOT NULL, `type` TEXT, `update_time` TEXT, `download_time` TEXT, `url` TEXT, `download_path` TEXT, `version` TEXT, `md5_sum` TEXT, `local_size` INTEGER NOT NULL)",
"fields": [
{
"fieldPath": "voiceId",
"columnName": "voiceId",
"affinity": "INTEGER",
"notNull": true
},
{
"fieldPath": "name",
"columnName": "name",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "gender",
"columnName": "gender",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "internalName",
"columnName": "internal_name",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "languageCode",
"columnName": "language_code",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "languageName",
"columnName": "language_name",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "variant",
"columnName": "variant",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "type",
"columnName": "type",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "updateTime",
"columnName": "update_time",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "downloadTime",
"columnName": "download_time",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "url",
"columnName": "url",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "downloadPath",
"columnName": "download_path",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "version",
"columnName": "version",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "md5Sum",
"columnName": "md5_sum",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "size",
"columnName": "local_size",
"affinity": "INTEGER",
"notNull": true
}
],
"primaryKey": {
"autoGenerate": true,
"columnNames": [
"voiceId"
]
},
"indices": [
{
"name": "index_voice_table_internal_name_gender_language_code_type",
"unique": true,
"columnNames": [
"internal_name",
"gender",
"language_code",
"type"
],
"orders": [],
"createSql": "CREATE UNIQUE INDEX IF NOT EXISTS `index_voice_table_internal_name_gender_language_code_type` ON `${TABLE_NAME}` (`internal_name`, `gender`, `language_code`, `type`)"
}
],
"foreignKeys": []
},
{
"tableName": "app_data_table",
"createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`appDataId` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, `schema_version` TEXT NOT NULL, `current_voice_id` INTEGER NOT NULL, `voice_list_update_time` TEXT, `privacy_info_dialog_accepted` INTEGER NOT NULL DEFAULT 0, `crash_lytics_user_consent_accepted` INTEGER NOT NULL DEFAULT 0)",
"fields": [
{
"fieldPath": "appDataId",
"columnName": "appDataId",
"affinity": "INTEGER",
"notNull": true
},
{
"fieldPath": "schemaVersion",
"columnName": "schema_version",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "currentVoiceId",
"columnName": "current_voice_id",
"affinity": "INTEGER",
"notNull": true
},
{
"fieldPath": "voiceListUpdateTime",
"columnName": "voice_list_update_time",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "privacyInfoDialogAccepted",
"columnName": "privacy_info_dialog_accepted",
"affinity": "INTEGER",
"notNull": true,
"defaultValue": "0"
},
{
"fieldPath": "crashLyticsUserConsentGiven",
"columnName": "crash_lytics_user_consent_accepted",
"affinity": "INTEGER",
"notNull": true,
"defaultValue": "0"
}
],
"primaryKey": {
"autoGenerate": true,
"columnNames": [
"appDataId"
]
},
"indices": [],
"foreignKeys": []
},
{
"tableName": "norm_dict_table",
"createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, `term` TEXT, `replacement` TEXT)",
"fields": [
{
"fieldPath": "id",
"columnName": "id",
"affinity": "INTEGER",
"notNull": true
},
{
"fieldPath": "term",
"columnName": "term",
"affinity": "TEXT",
"notNull": false
},
{
"fieldPath": "replacement",
"columnName": "replacement",
"affinity": "TEXT",
"notNull": false
}
],
"primaryKey": {
"autoGenerate": true,
"columnNames": [
"id"
]
},
"indices": [
{
"name": "index_norm_dict_table_term",
"unique": true,
"columnNames": [
"term"
],
"orders": [],
"createSql": "CREATE UNIQUE INDEX IF NOT EXISTS `index_norm_dict_table_term` ON `${TABLE_NAME}` (`term`)"
}
],
"foreignKeys": []
}
],
"views": [],
"setupQueries": [
"CREATE TABLE IF NOT EXISTS room_master_table (id INTEGER PRIMARY KEY,identity_hash TEXT)",
"INSERT OR REPLACE INTO room_master_table (id,identity_hash) VALUES(42, '37c14f434910e65cf2a913a3a95eb959')"
]
}
}
13 changes: 13 additions & 0 deletions app/src/main/AndroidManifest.xml
Original file line number Diff line number Diff line change
Expand Up @@ -66,17 +66,30 @@
android:configChanges="orientation"
android:parentActivityName=".TTSManager"
android:exported="true"
android:launchMode="standard"
android:theme="@style/AppTheme">
<intent-filter>
<action android:name="android.speech.tts.engine.INSTALL_TTS_DATA" />
<category android:name="android.intent.category.DEFAULT" />
</intent-filter>
</activity>
<activity
android:name=".NormDictListView"
android:configChanges="orientation"
android:parentActivityName=".TTSManager"
android:exported="true"
android:theme="@style/AppTheme">
</activity>
<activity
android:name=".VoiceInfo"
android:theme="@style/AppTheme"
android:parentActivityName=".VoiceManager"
/>
<activity
android:name=".NormDictInfo"
android:theme="@style/AppTheme"
android:parentActivityName=".NormDictListView"
/>
<activity
android:name=".CheckSimVoices"
android:exported="true"
Expand Down
2 changes: 1 addition & 1 deletion app/src/main/cpp/g2p/G2P.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ void G2P::Initialize()
try
{
std::set_new_handler(FailedNewHandler);
char* argv0 = "G2P";
char* argv0 = (char*) "G2P";
char** argv = &argv0;
int argc = 1;
SET_FLAGS(argv[0], &argc, &argv, true);
Expand Down
Loading
Loading