Goruut

A tokenizer, text cleaner, and IPA phonemizer/dephonemizer/transphonemizer for several human languages.

Try it online

It is possible to try this software live at hashtron.cloud or at hugging face.

Features

Phone set: IPA
Supported languages: 136
Processing speed: fast
Phone tokens: yes
Syllable tokens: no
Word tokens: yes
Punctuation preservation: yes
Stressed phones: yes
Tonal phones: yes, 5 tones (˥ ˦ ˧ ˨ ˩)
Tie: no

Installation

go install github.com/neurlang/goruut/cmd/goruut@latest

Docker Compose installation

Clone the repo and then run in root directory this command:

sudo docker compose up -d --force-recreate --build

Supported Languages

Afrikaans
Albanian
Amharic
Arabic
Aragonese
Armenian
Assamese
Azerbaijani
Bashkir
Basque
Belarusian
Bengali
Bengali Dhaka
Bengali Rahr
Bishnupriya Manipuri
Bosnian
Bulgarian
Burmese
Catalan
Cebuano
Chechen
Cherokee
Chichewa
Chinese Mandarin
Chuvash
Croatian
Czech
Danish
Dutch
Dzongkha
English
English American
English British
Esperanto
Estonian
Farsi
Finnish
French
Gaelic Irish
Gaelic Scottish
Galician
Georgian
German
Greek
Greenlandic
Guarani
Gujarati
Haitian Creole
Hausa
Hawaiian
Hebrew
Hindi
Hungarian
Icelandic
Ido
Indonesian
Interlingua
Isan
Italian
Jamaican
Japanese
Javanese
Kannada
Kazakh
Khmer Central
Kiche
Konkani
Korean
Kurdish
Kyrgyz
Lang Belta
Lao
Latgalian
Latin Classical
Latin Ecclesiastical
Latvian
Lingua Franca Nova
Lithuanian
Lojban
Lule Saami
Luxembourgish
Macedonian
Malayalam
Malay Arab
Malay Latin
Maltese
Maori
Marathi
Mongolian
Nahuatl Central
Nahuatl Classical
Nahuatl Mecayapan
Nahuatl Tetelcingo
Nepali
Nogai
Norwegian
Oromo
Papiamento
Pashto
Polish
Portuguese
Punjabi
Quechua
Quenya
Romanian
Russian
Serbian
Setswana
Shan Tai Yai
Sindarin
Sindhi
Sinhala
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Tatar
Telugu
Thai
Tibetan
Turkish
Turkmen
Ukrainian
Urdu
Uyghur
Uzbek
Vietnamese Central
Vietnamese Northern
Vietnamese Southern
Welsh North
Welsh South
Yoruba
Zulu

The goal to support all of voice2json's languages has been met. However, please add a language if you have the necessary data.

Listening to the generated speech

There are currently 3 target languages (IPA flavors). They are:

IPA - Copy the output into ipa-reader.xyz and pick a correct language voice
Espeak - Copy the output into espeak. For example czech: espeak -v cs "[[ru:Zovi: ku:n^]]"
Antvaset - Copy the output into antvaset.com and pick a correct language voice

Dependencies

See go.mod file for an up-to-date list of depended-on projects. Minimum supported version of golang is go 1.19 (project uses type parameters).

Numbers, Dates, and More

Unsupported. Please write them using words.

Command-Line Usage

To start, launch the server using the example config (in configs dir):

./goruut -configfile configs/config.json

This will launch the server at a specific http port. You should see the port which you specified in the config file:

INFO[0000] Binding port: 18080

Then you can run queries:

POST http://127.0.0.1:18080/tts/phonemize/sentence

{
	"Language": "Czech",
	"Sentence": "jsem supr"	
}

Output should be:

{
	"Words": [
		{
			"Linguistic": "jsem",
			"Phonetic": "jsɛm"
		},
		{
			"Linguistic": "supr",
			"Phonetic": "supr"
		}
	]
}

Intended Audience

goruut is useful for transforming raw text into phonetic pronunciations, similar to phonemizer. Unlike phonemizer, goruut looks up words in a pre-built lexicon (pronunciation dictionary) or guesses word pronunciations with a pre-trained transformer-based grapheme-to-phoneme model.

Name		Name	Last commit message	Last commit date
Latest commit History 583 Commits
.github/workflows		.github/workflows
app		app
cmd		cmd
configs		configs
controllers		controllers
dicts		dicts
dicts_scripts		dicts_scripts
doc		doc
helpers		helpers
lib		lib
models		models
repo		repo
usecases		usecases
views		views
.gitignore		.gitignore
.slsa-goreleaser-android-arm64.yml		.slsa-goreleaser-android-arm64.yml
.slsa-goreleaser-darwin-amd64.yml		.slsa-goreleaser-darwin-amd64.yml
.slsa-goreleaser-darwin-arm64.yml		.slsa-goreleaser-darwin-arm64.yml
.slsa-goreleaser-freebsd-386.yml		.slsa-goreleaser-freebsd-386.yml
.slsa-goreleaser-freebsd-amd64.yml		.slsa-goreleaser-freebsd-amd64.yml
.slsa-goreleaser-freebsd-arm.yml		.slsa-goreleaser-freebsd-arm.yml
.slsa-goreleaser-freebsd-arm64.yml		.slsa-goreleaser-freebsd-arm64.yml
.slsa-goreleaser-linux-386.yml		.slsa-goreleaser-linux-386.yml
.slsa-goreleaser-linux-amd64.yml		.slsa-goreleaser-linux-amd64.yml
.slsa-goreleaser-linux-arm.yml		.slsa-goreleaser-linux-arm.yml
.slsa-goreleaser-linux-arm64.yml		.slsa-goreleaser-linux-arm64.yml
.slsa-goreleaser-linux-riscv64.yml		.slsa-goreleaser-linux-riscv64.yml
.slsa-goreleaser-windows-386.yml		.slsa-goreleaser-windows-386.yml
.slsa-goreleaser-windows-amd64.yml		.slsa-goreleaser-windows-amd64.yml
.slsa-goreleaser-windows-arm.yml		.slsa-goreleaser-windows-arm.yml
.slsa-goreleaser-windows-arm64.yml		.slsa-goreleaser-windows-arm64.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
coverage_forward.txt		coverage_forward.txt
coverage_reverse.txt		coverage_reverse.txt
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
success_forward.txt		success_forward.txt
success_reverse.txt		success_reverse.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goruut

Try it online

Features

Installation

Docker Compose installation

Supported Languages

Listening to the generated speech

Dependencies

Numbers, Dates, and More

Command-Line Usage

Intended Audience

About

Releases 33

Packages

Contributors 2

Languages

License

neurlang/goruut

Folders and files

Latest commit

History

Repository files navigation

Goruut

Try it online

Features

Installation

Docker Compose installation

Supported Languages

Listening to the generated speech

Dependencies

Numbers, Dates, and More

Command-Line Usage

Intended Audience

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 33

Packages 0

Contributors 2

Languages

Packages