Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[egs] Updating SITW recipe to account for changes to VoxCeleb1 #2690

Merged
merged 3 commits into from
Sep 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions egs/sitw/v1/local/make_voxceleb1.pl
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@
system("wget -O $out_dir/voxceleb1_sitw_overlap.txt http://www.openslr.org/resources/49/voxceleb1_sitw_overlap.txt");
}

if (! -e "$data_base/vox1_meta.csv") {
system("wget -O $data_base/vox1_meta.csv http://www.openslr.org/resources/49/vox1_meta.csv");
}

# sitw_overlap contains the list of speakers that also exist in our evaluation set, SITW.
my %sitw_overlap = ();
open(OVERLAP, "<", "$out_dir/voxceleb1_sitw_overlap.txt") or die "Could not open the overlap file $out_dir/voxceleb1_sitw_overlap.txt";
Expand All @@ -34,6 +38,20 @@
my $spkr_id = $_;
$sitw_overlap{$spkr_id} = ();
}
close(OVERLAP) or die;

open(META_IN, "<", "$data_base/vox1_meta.csv") or die "Could not open the meta data file $data_base/vox1_meta.csv";

# Also add the banned speakers to sitw_overlap using their ID format in the
# newest version of VoxCeleb.
while (<META_IN>) {
chomp;
my ($vox_id, $spkr_id, $gender, $nation, $set) = split;
if (exists($sitw_overlap{$spkr_id})) {
$sitw_overlap{$vox_id} = ();
}
}
close(META_IN) or die;

opendir my $dh, "$data_base/voxceleb1_wav" or die "Cannot open directory: $!";
my @spkr_dirs = grep {-d "$data_base/voxceleb1_wav/$_" && ! /^\.{1,2}$/} readdir($dh);
Expand Down
4 changes: 2 additions & 2 deletions egs/sre16/v2/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ if [ $stage -le 0 ]; then
fi

if [ $stage -le 1 ]; then
# Make filterbanks and compute the energy-based VAD for each dataset
# Make MFCCs and compute the energy-based VAD for each dataset
if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $mfccdir/storage ]; then
utils/create_split_dir.pl \
/export/b{14,15,16,17}/$USER/kaldi-data/egs/sre16/v2/xvector-$(date +'%m_%d_%H_%M')/mfccs/storage $mfccdir/storage
Expand Down Expand Up @@ -159,7 +159,7 @@ if [ $stage -le 2 ]; then
utils/subset_data_dir.sh data/swbd_sre_aug 128000 data/swbd_sre_aug_128k
utils/fix_data_dir.sh data/swbd_sre_aug_128k

# Make filterbanks for the augmented data. Note that we do not compute a new
# Make MFCCs for the augmented data. Note that we do not compute a new
# vad.scp file here. Instead, we use the vad.scp from the clean version of
# the list.
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 40 --cmd "$train_cmd" \
Expand Down