The Buckeye Pronunciation Dictionary is a data-driven English pronunciation dictionary, suitable for use in speech recognition systems and other applications that use phonological information about English words. It is comparable to CMUDict, but is derived from a large-scale speech corpus, rather than annotator intuitions.
The dictionary consists a four columns separated by tabs:
- Word
- Phonological transcription, derived from Arpabet
- Number of occurrences in corpus
- Mean length of utterance