Simply download LibriSpeech from OpenSLR and unzip it. Fill in the path
in config file for self-supervised learning with the path to unzipped LibriSpeech.
-
Download WSJ dataset (requires LDC license)
-
Download and compile sph2pipe_v2.5 to read WSJ dataset
wget http://www.openslr.org/resources/3/sph2pipe_v2.5.tar.gz
tar xzf sph2pipe_v2.5.tar.gz
cd sph2pipe_v2.5; gcc -o sph2pipe *.c -lm
- Refactor (generate wav files and place them all together) WSJ with
python refactor_wsj.py --wsj_root /path/to/downloaded/wsj/ \
--dest /path/to/store/new/wsj/
- (For phone classification only.) For each utterance, please use Kaldi to obtain force aligment and store the corresponding phone index sequence with
torch.save
at/path/to/store/new/wsj/phn/fileid.pt
(orfileid_nocrop.pt
fordev93
split) wherefileid.wav
can be found at/path/to/store/new/wsj/wav/
after previous step. Last, copy the list offileid
of different splits to he refactored wsj dataset for use with
cp -r phn_split/ /path/to/store/new/wsj/wav/meta/
- (For speaker classification only.) The list of
fileid
&speaker
pairs used in different splits are stored atspk/
. Copy them to the refactored wsj dataset for use with
cp -r spk_split/ /path/to/store/new/wsj/wav/spk/
- Modify the
path
in config file for downstream tasks to/path/to/store/new/wsj/