I followed a few different repositories and some youtube videos to build the beginning of a speech recognition program
(from pocketsphinx-ruby)
This gem depends on Pocketsphinx (libpocketsphinx), and Sphinxbase (libsphinxbase and libsphinxad). The current stable versions (0.8) are from late 2012 and are now outdated. Build them manually from source, or on OSX the latest development (potentially unstable) versions can be installed using Homebrew as follows (more information here).
Add the Homebrew tap:
$ brew tap watsonbox/cmu-sphinx
You'll see some warnings as these formulae conflict with those in the main reponitory, but that's fine.
Install the libraries:
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxbase
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-pocketsphinx
Download sphinxtrain-5prealpha.tar.gz and unzip in the same directory.
cd sphinxtrain-5prealpha
./configure
make
make install
cd ..
(partially followed this tutorial) Create a new speech folder
cd assets
mkdir speech
cd assets/speech
cmusphinx-en-us-ptm-5.2.tar.gz
Rename this unzipped folder to en-us-download
Next create some test reate test files. I recorded myself reading a poem and split each audio stanza into a seperate numbered .wav
file, then created plaintext files for File IDs and transcriptions
- Numbered audio files RECORDINGS AT SAMPLE RATE OF 16 KHZ (16000) AND IN MONO WITH SINGLE CHANNEL.
- A plaintext File IDs file named [audiofile].fileids
- A transcription file named [audiofile].transcription
The files should look like
1.wav
2.wav
.....
12.wav
audio.fileids
audio.transcription
[audioname].fileids shoud be formatted like:
1
2
3
4
5
6
7
8
9
10
11
12
[audiofile].transcription should be formatted with <s></s>
tags surrounding each line and a (#)
with the corresponding audio file name in the parentheses. NO CAPITAL LETTERS OR SYMBOLS:
<s> there is a place where the sidewalk ends </s> (1)
<s> and before the street begins </s> (2)
<s> and there the grass grows soft and white </s> (3)
<s> and there the sun burns crimson bright </s> (4)
<s> and there the moon bird rests from his flight </s> (5)
<s> to cool in the peppermint wind </s> (6)
<s> let us leave this place where the smoke blows black </s> (7)
<s> and the dark street winds and bends </s> (8)
<s> past the pits where the asphalt flowers grow </s> (9)
<s> we shall walk with a walk that is measured and slow </s> (10)
<s> and watch where the chalk-white arrows go </s> (11)
<s> to the place where the sidewalk ends </s> (12)
Then copy more dependencies
cp -a /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict .
cp -a /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin .
Then create .mfc
files out of each .wav
file
sphinx_fe -argfile en-us-download/feat.params -samprate 16000 -c test.fileids -di . -do . -ei wav -eo mfc -mswav yes
Copy over some Sphinx Train files
cp /usr/local/libexec/sphinxtrain/bw .
cp /usr/local/libexec/sphinxtrain/map_adapt .
cp /usr/local/libexec/sphinxtrain/mk_s2sendump .
Then run
./bw -hmmdir en-us-download -moddeffn en-us-download/mdef -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn test.fileids -lsnfn test.transcription -accumdir .
cp -a en-us-download en-us-jj
./map_adapt -moddeffn en-us-download/mdef -ts2cbfn .ptm. -meanfn en-us-download/means -varfn en-us-download/variances -mixwfn en-us-download/mixture_weights -tmatfn en-us-download/transition_matrices -accumdir . -mapmeanfn en-us-jj/means -mapvarfn en-us-jj/variances -mapmixwfn en-us-jj/mixture_weights -maptmatfn en-us-jj/transition_matrices
pocketsphinx_continuous -hmm en-us-jj -lm en-us.lm.bin -dict cmudict-en-us.dict -infile audio.wav > audio.txt
I wrote a Ruby program to automate this process, Readme on that coming soon