-
Notifications
You must be signed in to change notification settings - Fork 296
Model
Gentle uses Kaldi to recognize speech in your audio and align it with text.
The speech recognition model that is packaged with Gentle is based on the Kaldi fisher_english_v8 model (built by Dan Povey).
The acoustic model was created using a multi-splice deep neural network. It was trained on over 4000 hours of 8KHz (telephone bandwidth) conversational speech audio from the Fisher English corpus.
A new bigram language model is built every time you run Gentle to fit the words contained in your transcript.
After recognition, the speech is split into phonemes using a version of The CMU Pronouncing Dictionary. The phoneme set is based on ARPAbet.
We do not yet support alignment using other acoustic models or alignment in languages other than English. However we would like to! In future versions it may be possible to swap out the model for one better-suited to your domain or trained on another language.