Skip to content

This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs. The output is a mix of in-vocabulary words and phoneme sequences. This decoding is suitable for systems with only a small dictionary available and for further recovery of OOV words.

License

Notifications You must be signed in to change notification settings

kate-egorova/ASR-hybrid-decoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

An updated version of this toolkit now lives on our lab's github

ASR-hybrid-decoding

This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs. The output is a mix of in-vocabulary words and phoneme sequences. This decoding is suitable for systems with only a small dictionary available and for further recovery of OOV words.

Theory:

Brief description of the hybrid decoding system can be found in a paper and generally follows an approach in an earlier paper

Requirements:

For this to work you'll need kaldi speech recognition toolkit installed

This expansion of kaldi was been tested on the following databases:

  1. LibriSpeech
  2. Wall Street Journal

How to run:

First run kaldi recipies and then on top of them you can run hybrid decoding as presented here. The file structure in this repository is the same as kaldi file structure, so it suffices to copy scripts from this repository to corresponding folders in your kaldi system build. After that, run run_hybrid_decoding.sh script, which will build the hybrid decoding graph and perform the decoding.

LibriSpeech setup:

OOV_list_1000.txt has a selection of 1000 words to perform as OOVs for this database. For a detailed description of how they were chosen, see subsection 3.1 in paper

About

This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs. The output is a mix of in-vocabulary words and phoneme sequences. This decoding is suitable for systems with only a small dictionary available and for further recovery of OOV words.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages