CS 753 ASR project
- Run
bash download.sh
to prepare the VCC2018 dataset. - Run
analyzer.py
to extract features and write features into binary files. (This takes a few minutes.) - Run
build.py
to record some stats, such as spectral extrema and pitch. - To train a VAWGAN, for example, run
python main.py \
--model VAWGAN \
--trainer VAWGANTrainer \
--architecture architecture-vawgan-vcc2016.json
- You can find your models in
./logdir/train/[timestamp]
- To convert the voice, run
python convert.py \
--src VCC2SF1 \
--trg VCC2TM1 \
--model VAWGAN \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--file_pattern "./dataset/vcc2018/bin/Training Set/{}/[0-9]*.bin"
*Please fill in timestamp
and model id
.
7. You can find the converted wav files in ./logdir/output/[timestamp]
8. If you want to convert all the voices, run
./convert_all.sh \
--model VAWGAN \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--output_dir [directory to store converted audio]
-
Ensure you have
w_prob_dict.pkl
andw_vec_dict.pkl
indata
directory.- For
w_prob_dict.pkl
you have two options. Either useget_word_prob_from_corpus
this demands a corpus as an input. We used WikiText. Or you can get a csv file with unigram probabilities (we mentioned the source in the report http://norvig.com/ngrams/) and use the function get_w_prob_from_csv. - For
w_vec_dict.pkl initialize
a Sentence_Embedding object and then call the functionprune_word_vec
. This essentially keeps only those embeddings which are present in the transcriptions since it takes a lot more time (and ram) to get the parse the whole fasttext data. - All pickle files are shared here https://drive.google.com/drive/folders/1FWGGEQ9wTUewBDFq5ssT4BP4cMyt8lh1
- For
-
Download the dataset using
bash download.sh
-
Run
python sentence_embedding.py
. This should createsent_emb.pkl
insidedata
directory. -
Run
analyzer.py
to extract features, store them along with sentence embeddings. -
Run
build.py
to find statistics about features. -
To train with sentence embedding, run
python main.py \
--model VAWGAN_S \
--trainer VAWGAN_S \
--architecture architecture-vawgan-sent.json
- For conversion, run
python convert.py \
--src VCC2SF1 \
--trg VCC2TM1 \
--model VAWGAN_S \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--file_pattern "./dataset/vcc2018/bin/Training Set/{}/[0-9]*.bin"
or
./convert_all.sh \
--model VAWGAN_S \
--checkpoint logdir/train/[timestamp]/[model.ckpt-[id]] \
--output_dir [directory to store converted audio]