- model.py: rnnt joint model
- model2012.py: graves2012 model
- train_rnnt.py: rnnt training script
- train_ctc.py: ctc acoustic model training script
- eval.py: rnnt & ctc decode
- DataLoader.py: kaldi feature loader
-
Extract feature link kaldi timit example dirs (
local
steps
utils
) excuterun.sh
to extract 40 dim fbank feature runfeature_transform.sh
to get 123 dim feature as described in Graves2013 -
Train CTC acoustic model
python train_ctc.py --lr 1e-3 --bi --dropout 0.5 --out exp/ctc_bi_lr1e-3 --schedule
- Train RNNT joint model
python train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule
- Decode
python eval.py <path to best model> [--ctc] --bi
Model | PER |
---|---|
CTC | 21.38 |
RNN-T | 20.59 |
- Python 3.6
- PyTorch >= 0.4
- numpy 1.14
- warp-transducer
- RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
- RNNT joint (Graves 2013): Speech Recognition with Deep Recurrent Neural Networks
- (PyTorch End-to-End Models for ASR)[https://github.com/awni/speech]
- (A Fast Sequence Transducer GPU Implementation with PyTorch Bindings)[https://github.com/HawkAaron/warp-transducer/tree/add_network_accelerate]