-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
79 lines (54 loc) · 3.02 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Training Files:
In order to train a new siamese model, the following files
are required:
-> The individual aligned files used for training.
these files should be numeric, where each line
corresponds to the extracted features per frame used
(e.g. MFCCS, filterbanks, posteriors, etc.)
These files should be in the /data_mfccs/ or a
similar directory
-> The pairs file.
This file has the pairs use for training a specific
model. The file should be in the format:
"phoneme1,0,phoneme2,0" if both phonemes are the same
consonant (positive pair), or the format "phoneme1,0,phoneme2,1" if
they correspond to different consonants (negative pair) the label
is computed as the difference between the two numbers, and can be
changed in the siamese.py code.
These files should be in the directory /individual_phonemes/
-> The reference files.
These files contain an index of the phonemes used during training.
There should be individual reference files for each one of the 16
consonants used. These files are important for computing the similarity
between the reference phonemes and new unseen phonemes (done during
testing and inference).
These files should be in the directory /reference/
The directory /checkpoints/ corresponds to the output directory where the
finished trained models are stored. Due to the fast nature of the training,
the system only outputs one file, that is overwritten every new checkpoint
interval.
The directory /pats_test_mfccs/ contains the speakers that are used to validate
the system. Inside this directory there are several sub-dirs that contain the
different phonemes split by class.
The directory /pats_full_mfccs/ contains several subdirs each one corresponding
to an unseen speaker. Inside one of this subdirs there are also several subdirs
corresponding to the different phonemes split by class.
The file **classes.py** contains the dataloaders and also the discriminated
architecture of the system.
The file **hparams.py** contains the hyperparameters used. Every hyperparameter
contains a small explanation.
The file **siamese.py** contains the full training pipeline of the system
The file **test_phonemes.py** contains the code to calculate the new similarity
measures with unseen phonemes, comparing each new unseen phoneme with all the
reference phonemes of a specific class that were seen during training.
usage:
python test_phonemes.py ./pats_test_mfccs/s-cons/ ./reference/s_filenames.csv ./checkpoints/model.pth
in this case we will be comparing every /s/ phoneme present in the validation
set with the reference /s/ phonemes seen during training, using a trained model
in a given directory (in this case, ./checkpoints/model.pth)
The file **siamese.py** trains the siamese system given the specifications
defined in hparams.py, and also the training files previously mentioned
The file **sim_calc_eval.sh** is a bash script that automates the validation
process
The file **sim_calc.sh** is a bash script that computes the similarity measures
for the new unseen speakers.