Skip to content

velociburner/asr-tagalog

Repository files navigation

asr-tagalog

Project for COSI 136a ASR

Requirements

Command line: ffmpeg, sox/soxi

Python: Using Python 3.9+,

pip install -r requirements.txt

Resampling

Resample all mp3 files in a directory to wav files:

./resample.sh <DIR>

Check the total length of the resampled files:

soxi resampled/ | tail -n1

Split corpus

Split directory of parallel .TextGrid and .wav files into short segments to use in a model:

usage: split_corpus.py [-h] [--max-seconds MAX_SECONDS] indir outdir

positional arguments:
  indir                 Directory of parallel .TextGrid and .wav files to load
  outdir                Directory to write segmented parallel .txt and .wav files

options:
  -h, --help            show this help message and exit
  --max-seconds MAX_SECONDS
                        Maximum duration in seconds of segmented audio files

Statistics

Calculate statistics on the train, dev, and test splits (type/token counts, OOV rate):

usage: corpus_stats.py [-h] [--train TRAIN] [--dev DEV] [--test TEST]

options:
  -h, --help     show this help message and exit
  --train TRAIN  Directory for train partition containing .txt files
  --dev DEV      Directory for dev partition containing .txt files
  --test TEST    Directory for test partition containing .txt files

Training a model

Follow the instructions in train.ipynb to fine-tune a pre-trained Whisper model on the newly created data and evaluate the results. Note: this has only been tested in Google Colab using a T4 GPU, so there is no guarantee it won't crash on another platform/architecture, including on CPU.

About

Project for COSI 136a ASR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published