spoken word recognition using CTC LSTMs
- Create a virtual environment:
python -m venv venv
- Install the required packages:
./venv/bin/pip install -r requirements.txt
- Train the model:
./venv/bin/python main.py train
(takes a few hours and needs around 20GB disk and 5GB memory)- or download my pre-trained model (25 epochs, not good) from
here and move it to
target/model-final.ckpt
- or download my pre-trained model (25 epochs, not good) from
here and move it to
- Test the final model:
./venv/bin/python main.py test
- Infer text from flac:
./venv/bin/python main.py infer audio.flac
- This is a proof-of-concept
- Does not use CUDA but should be easy to implement