-
Notifications
You must be signed in to change notification settings - Fork 6
Training and development
In this section we explain how to install the training and development tools for CMUSphinx. As opposed to the casual usage of pocketsphinx
, it is necessary to download and compile the CMUSphinx code-base for executing the advanced tasks such as training and adapting acoustic models.
For the following steps, we need to download and compile three of the CMUSphinx tools: namely sphinxbase
, sphinxtrain
as well as pocketsphinx
. The most up-to-date versions are in github. Hence simply clone the three repositories.
git clone https://github.com/cmusphinx/sphinxbase
git clone https://github.com/cmusphinx/sphinxtrain
git clone https://github.com/cmusphinx/pocketsphinx
First we make sure that the requirements are installed
sudo apt-get install pkg-config autoconf make automake libtool bison python3-dev
Then step by step we will compile the tools in the order: sphinxbase
, sphinxtrain
, pocketsphinx
. The order is only important for sphinxbase
, which needs to come first.
Below we have the compilation commands for a local installation, hence with the use of prefixes. Otherwise do a system-wide install as a superuser.
sphinxbase
./autogen.sh --prefix=~/sphinx/local
make
make check
make install
sphinxtrain
./autogen.sh --prefix=~/sphinx/local --with-sphinxbase=<your-sphinxbase-dir>
make
make check
make install
pocketsphinx
./autogen.sh --prefix=~/sphinx/local --with-sphinxbase=<your-sphinxbase-dir>
make
make check
make install
If you compiled them with a prefix then as a final step you need to add these paths to your ./profile
. Just use a text editor to add the following lines to ./profile
and don't forget to do source ./profile
for your active terminal session.
export PATH=/home/user/sphinx/local/bin:$PATH
export LD_LIBRARY_PATH=/home/baybars/scripts/sphinx/local/lib:$LD_LIBRARY_PATH
export PKG_CONFIG_PATH=/home/baybars/scripts/sphinx/local/pkgconfig:$PKG_CONFIG_PATH
There is an easy solution for downloading and compiling the code-base for Mac OSX. All of the compilation tasks explained above are encapsulated in a custom brew
. For an in depth explanation and the scripts themselves, please go to the repository of the user watsonbox.
One of the more interesting functionalities of speech tools is the ability to create specific models for specific situations or speakers. Starting from a generic acoustic model, we can have an adaptation for a specific user voice, a sound condition (like a consistently noisy background), or a specific recording hardware. You can find a very in depth tutorial within the CMUSphinx webpage, and you can execute them easily using our models and additional resources which includes audio files for training, their transcripts, and the path reference files, ready for execution.
In order to follow the steps of the tutorial you will need these executables from your CMUSphinx library: sphinx_fe
for feature extraction; pocketsphinx_mdef_convert
in case your mdef (mixture weights) file is in binary format; and bw
to collect statistics from the adaptation data.
The actual adaptation tools are mllr_solve
and map_adapt
which refer to the methods, Maximum Likelihood Linear Regression (MLLR) and Maximum a posteriori Probability (MAP). MLLR creates a transformation file that interacts with the decoding process during run-time, whereas MAP actually modifies the parameters in the acoustic model. Quoting the CMUSphinx tutorial:
"MLLR is a cheap adaptation method that is suitable when the amount of data is limited. It’s a good idea to use MLLR for online adaptation. MLLR works best for a continuous model. Its effect for semi-continuous models is very limited since semi-continuous models mostly rely on mixture weights. If you want the best accuracy you can combine MLLR adaptation with MAP adaptation below. On the other hand, because MAP requires a lot of adaptation data it is not really practical to use it for continuous models. For continuous models MLLR is more reasonable."