From a3f0f5e23ebb03c52836a82768afa74dccb36a2c Mon Sep 17 00:00:00 2001 From: nithinraok Date: Thu, 12 May 2022 23:07:13 -0700 Subject: [PATCH 1/3] update speaker docs Signed-off-by: nithinraok --- .../asr/speaker_diarization/datasets.rst | 10 ++--- .../asr/speaker_recognition/datasets.rst | 34 ++++++++--------- examples/speaker_tasks/recognition/README.md | 12 +++--- scripts/dataset_processing/get_hi-mia_data.py | 4 +- scripts/speaker_tasks/filelist_to_manifest.py | 38 +++++++++---------- ...st.py => pathfiles_to_diarize_manifest.py} | 0 .../ASR_with_SpeakerDiarization.ipynb | 4 +- .../Speaker_Diarization_Inference.ipynb | 4 +- .../Speaker_Identification_Verification.ipynb | 22 +++++------ 9 files changed, 64 insertions(+), 64 deletions(-) rename scripts/speaker_tasks/{pathsfiles_to_manifest.py => pathfiles_to_diarize_manifest.py} (100%) diff --git a/docs/source/asr/speaker_diarization/datasets.rst b/docs/source/asr/speaker_diarization/datasets.rst index d2c8d2a93944..ab38243fbb81 100644 --- a/docs/source/asr/speaker_diarization/datasets.rst +++ b/docs/source/asr/speaker_diarization/datasets.rst @@ -14,11 +14,11 @@ Diarization inference is based on Hydra configurations which are fulfilled by `` {"audio_filepath": "/path/to/abcd.wav", "offset": 0, "duration": null, "label": "infer", "text": "-", "num_speakers": null, "rttm_filepath": "/path/to/rttm/abcd.rttm", "uem_filepath": "/path/to/uem/abcd.uem"} -In each line of the input manifest file, ``audio_filepath`` item is mandatory while the rest of the items are optional and can be passed for desired diarization setting. We refer to this file as a manifest file. This manifest file can be created by using the script in ``/scripts/speaker_tasks/pathsfiles_to_manifest.py``. The following example shows how to run ``pathsfiles_to_manifest.py`` by providing path list files. +In each line of the input manifest file, ``audio_filepath`` item is mandatory while the rest of the items are optional and can be passed for desired diarization setting. We refer to this file as a manifest file. This manifest file can be created by using the script in ``/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py``. The following example shows how to run ``pathfiles_to_diarize_manifest.py`` by providing path list files. .. code-block:: bash - python pathsfiles_to_manifest.py --paths2audio_files /path/to/audio_file_path_list.txt \ + python pathfiles_to_diarize_manifest.py --paths2audio_files /path/to/audio_file_path_list.txt \ --paths2txt_files /path/to/transcript_file_path_list.txt \ --paths2rttm_files /path/to/rttm_file_path_list.txt \ --paths2uem_files /path/to/uem_file_path_list.txt \ @@ -40,7 +40,7 @@ The ``--paths2audio_files`` and ``--manifest_filepath`` are required arguments. /path/to/abcd02.rttm -The path list files containing the absolute paths to these WAV, RTTM, TXT, CTM and UEM files should be provided as in the above example. ``pathsfiles_to_manifest.py`` script will match each file using the unique filename (e.g. ``abcd``). Finally, the absolute path of the created manifest file should be provided through Hydra configuration as shown below: +The path list files containing the absolute paths to these WAV, RTTM, TXT, CTM and UEM files should be provided as in the above example. ``pathsfiles_to_diarize_manifest.py`` script will match each file using the unique filename (e.g. ``abcd``). Finally, the absolute path of the created manifest file should be provided through Hydra configuration as shown below: .. code-block:: yaml @@ -127,7 +127,7 @@ To evaluate the performance on AMI Meeting Corpus, the following instructions ca - Download AMI Meeting Corpus from `AMI website `_. Choose ``Headset mix`` which has a mono wav file for each session. - Download the test set (whitelist) from `Pyannotate AMI test set whitelist `_. - The merged RTTM file for AMI test set can be downloaded from `Pyannotate AMI test set RTTM file `_. Note that this file should be split into individual rttm files. Download split rttm files for AMI test set from `AMI test set split RTTM files `_. - - Generate an input manifest file using ``/scripts/speaker_tasks/pathsfiles_to_manifest.py`` + - Generate an input manifest file using ``/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py`` CallHome American English Speech (CHAES), LDC97S42 @@ -154,5 +154,5 @@ To evaluate the performance on AMI Meeting Corpus, the following instructions ca - Download CHAES Meeting Corpus at LDC website `LDC97S42 `_ (CHAES is not publicly available). - Download the CH109 filename list (whitelist) from `CH109 whitelist `_. - Download RTTM files for CH109 set from `CH109 RTTM files `_. - - Generate an input manifest file using ``/scripts/speaker_tasks/pathsfiles_to_manifest.py`` + - Generate an input manifest file using ``/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py`` diff --git a/docs/source/asr/speaker_recognition/datasets.rst b/docs/source/asr/speaker_recognition/datasets.rst index 50a0eaec8a9a..88c600b3c523 100644 --- a/docs/source/asr/speaker_recognition/datasets.rst +++ b/docs/source/asr/speaker_recognition/datasets.rst @@ -24,35 +24,35 @@ After download and conversion, your `data` folder should contain directories wit All-other Datasets ------------------ -These methods can be applied to any dataset to get similar training manifest files. +These methods can be applied to any dataset to get similar training or inference manifest files. -First we prepare scp file(s) containing absolute paths to all the wav files required for each of the train, dev, and test set. This can be easily prepared by using ``find`` bash command as follows: +`filelist_to_manifest.py` script in `$/scripts/speaker_tasks/` folder generates manifest file from a text file containing paths to audio files. -.. code-block:: bash - - !find {data_dir}/{train_dir} -iname "*.wav" > data/train_all.scp - !head -n 3 data/train_all.scp +sample `filelist.txt` file contents: +.. code-block:: bash -Based on the created scp file, we use `scp_to_manifest.py` script to convert it to a manifest file. This script takes three optional arguments: + /data/datasets/voxceleb/data/dev/aac_wav/id00179/Q3G6nMr1ji0/00086.wav + /data/datasets/voxceleb/data/dev/aac_wav/id00806/VjpQLxHQQe4/00302.wav + /data/datasets/voxceleb/data/dev/aac_wav/id01510/k2tzXQXvNPU/00132.wav -* id: This value is used to assign speaker label to each audio file. This is the field number separated by `/` from the audio file path. For example if all audio file paths follow the convention of path/to/speaker_folder/unique_speaker_label/file_name.wav, by picking `id=3 or id=-2` script picks unique_speaker_label as label for that utterance. -* split: Optional argument to split the manifest in to train and dev json files -* create_chunks: Optional argument to randomly spit each audio file in to chunks of 1.5 sec, 2 sec and 3 sec for robust training of speaker embedding extractor model. +This list file is used to generate manifest file. This script has optional arguments to split the whole manifest file in to train and dev and also segment audio files to smaller segments for robust training (for testing, we don't need to create segments for each utterance). +sample usage: -After the download and conversion, your data folder should contain directories with manifest files as: - -* `data//train.json` -* `data//dev.json` -* `data//train_all.json` +.. code-block:: bash -Each line in the manifest file describes a training sample - audio_filepath contains the path to the wav file, duration it's duration in seconds, and label is the speaker class label: + python filelist_to_manifest.py --filelist=filelist.txt --id=-3 --out=speaker_manifest.json +This would create a manifest containing file contents as shown below: .. code-block:: json - {"audio_filepath": "/audio_file.wav", "duration": 3.9, "label": "speaker_id"} + {"audio_filepath": "/data/datasets/voxceleb/data/dev/aac_wav/id00179/Q3G6nMr1ji0/00086.wav", "offset": 0, "duration": 4.16, "label": "id00179"} + {"audio_filepath": "/data/datasets/voxceleb/data/dev/aac_wav/id00806/VjpQLxHQQe4/00302.wav", "offset": 0, "duration": 12.288, "label": "id00806"} + {"audio_filepath": "/data/datasets/voxceleb/data/dev/aac_wav/id01510/k2tzXQXvNPU/00132.wav", "offset": 0, "duration": 4.608, "label": "id01510"} +For other optional arguments like splitting manifest file to train and dev and for creating segements from each utterance refer to the arguments +described in the script. Tarred Datasets --------------- diff --git a/examples/speaker_tasks/recognition/README.md b/examples/speaker_tasks/recognition/README.md index b8dbdbf26388..fe96c33a5ad0 100644 --- a/examples/speaker_tasks/recognition/README.md +++ b/examples/speaker_tasks/recognition/README.md @@ -48,8 +48,8 @@ We first generate manifest file to get embeddings. The embeddings are then used ```bash # create list of files from voxceleb1 test folder (40 speaker test set) -find -iname '*.wav' > voxceleb1_test_files.scp -python /scripts/speaker_tasks/scp_to_manifest.py --scp voxceleb1_test_files.scp --id -3 --out voxceleb1_test_manifest.json +find -iname '*.wav' > voxceleb1_test_files.txt +python /scripts/speaker_tasks/filelist_to_manifest.py --filelist voxceleb1_test_files.txt --id -3 --out voxceleb1_test_manifest.json ``` ### Embedding Extraction Now using the manifest file created, we can extract embeddings to `data` folder using: @@ -92,14 +92,14 @@ ffmpeg -v 8 -i -f wav -acodec pcm_s16le Generate a list file that contains paths to all the dev audio files from voxceleb1 and voxceleb2 using find command as shown below: ```bash -find -iname '*.wav' > voxceleb1_dev.scp -find -iname '*.wav' > voxceleb2_dev.scp -cat voxceleb1_dev.scp voxceleb2_dev.scp > voxceleb12.scp +find -iname '*.wav' > voxceleb1_dev.txt +find -iname '*.wav' > voxceleb2_dev.txt +cat voxceleb1_dev.txt voxceleb2_dev.txt > voxceleb12.txt ``` This list file is now used to generate training and validation manifest files using a script provided in `/scripts/speaker_tasks/`. This script has optional arguments to split the whole manifest file in to train and dev and also chunk audio files to smaller chunks for robust training (for testing, we don't need this). ```bash -python /scripts/speaker_tasks/scp_to_manifest.py --scp voxceleb12.scp --id -3 --out voxceleb12_manifest.json --split --create_chunks +python /scripts/speaker_tasks/filelist_to_manifest.py --filelist voxceleb12.txt --id -3 --out voxceleb12_manifest.json --split --create_chunks ``` This creates `train.json, dev.json` in the current working directory. diff --git a/scripts/dataset_processing/get_hi-mia_data.py b/scripts/dataset_processing/get_hi-mia_data.py index 19572ac55472..4fbc3bcc26f9 100644 --- a/scripts/dataset_processing/get_hi-mia_data.py +++ b/scripts/dataset_processing/get_hi-mia_data.py @@ -135,7 +135,7 @@ def __process_data(data_folder: str, data_set: str): """ fullpath = os.path.abspath(data_folder) - scp = glob(fullpath + "/**/*.wav", recursive=True) + filelist = glob(fullpath + "/**/*.wav", recursive=True) out = os.path.join(fullpath, data_set + "_all.json") utt2spk = os.path.join(fullpath, "utt2spk") utt2spk_file = open(utt2spk, "w") @@ -152,7 +152,7 @@ def __process_data(data_folder: str, data_set: str): speakers = [] lines = [] with open(out, "w") as outfile: - for line in tqdm(scp): + for line in tqdm(filelist): line = line.strip() y, sr = l.load(line, sr=None) if sr != 16000: diff --git a/scripts/speaker_tasks/filelist_to_manifest.py b/scripts/speaker_tasks/filelist_to_manifest.py index 18ad6579a551..7f84b39c6053 100644 --- a/scripts/speaker_tasks/filelist_to_manifest.py +++ b/scripts/speaker_tasks/filelist_to_manifest.py @@ -30,21 +30,21 @@ This scipt converts a filelist file where each line contains to a manifest json file. Optionally post processes the manifest file to create dev and train split for speaker embedding -training, also optionally chunk an audio file in to segments of random DURATIONS and create those +training, also optionally segment an audio file in to segments of random DURATIONS and create those wav files in CWD. -While creating chunks, if audio is not sampled at 16Khz, it resamples to 16Khz and write the wav file. +While creating segments, if audio is not sampled at 16Khz, it resamples to 16Khz and write the wav file. Args: --filelist: path to file containing list of audio files ---manifest(optional): if you already have manifest file, but would like to process it for creating chunks and splitting then use manifest ignoring filelist +--manifest(optional): if you already have manifest file, but would like to process it for creating + segments and splitting then use manifest ignoring filelist --id: index of speaker label in filename present in filelist file that is separated by '/' --out: output manifest file name --split: if you would want to split the manifest file for training purposes - you may not need this for test set. output file names is _.json - Defaults to False ---create_chunks:if you would want to chunk each manifest line to chunks of 4 sec or less - you may not need this for test set, Defaults to False ---min_spkrs_count: min number of samples per speaker to consider and ignore otherwise + you may not need this for test set. output file names is _.json, defaults to False +--create_segments: if you would want to segment each manifest line to segments of [1,2,3,4] sec or less + you may not need this for test set, defaults to False +--min_spkrs_count: min number of samples per speaker to consider and ignore otherwise, defaults to 0 (all speakers) """ DURATIONS = sorted([1, 2, 3, 4], reverse=True) @@ -60,7 +60,7 @@ def filter_manifest_line(manifest_line): dur = manifest_line['duration'] label = manifest_line['label'] endname = os.path.splitext(audio_path.split(label, 1)[-1])[0] - to_path = os.path.join(CWD, 'chunks', label) + to_path = os.path.join(CWD, 'segments', label) to_path = os.path.join(to_path, endname[1:]) os.makedirs(os.path.dirname(to_path), exist_ok=True) @@ -87,8 +87,8 @@ def filter_manifest_line(manifest_line): c_start = int(float(start * sr)) c_end = c_start + int(float(temp_dur * sr)) - chunk = signal[c_start:c_end] - sf.write(to_file, chunk, sr) + segment = signal[c_start:c_end] + sf.write(to_file, segment, sr) meta = manifest_line.copy() meta['audio_filepath'] = to_file @@ -172,7 +172,7 @@ def get_labels(lines): return labels -def main(filelist, manifest, id, out, split=False, create_chunks=False, min_count=10): +def main(filelist, manifest, id, out, split=False, create_segments=False, min_count=10): if os.path.exists(out): os.remove(out) if filelist: @@ -185,8 +185,8 @@ def main(filelist, manifest, id, out, split=False, create_chunks=False, min_coun lines = process_map(get_duration, lines, chunksize=100) - if create_chunks: - print(f"creating and writing chunks to {CWD}") + if create_segments: + print(f"creating and writing segments to {CWD}") lines = process_map(filter_manifest_line, lines, chunksize=100) temp = [] for line in lines: @@ -197,7 +197,7 @@ def main(filelist, manifest, id, out, split=False, create_chunks=False, min_coun speakers = [x['label'] for x in lines] if min_count: - speakers, lines = count_and_consider_only(speakers, lines, min_count) + speakers, lines = count_and_consider_only(speakers, lines, abs(min_count)) write_file(out, lines, range(len(lines))) path = os.path.dirname(out) @@ -232,14 +232,14 @@ def main(filelist, manifest, id, out, split=False, create_chunks=False, min_coun action='store_true', ) parser.add_argument( - "--create_chunks", - help="bool if you would want to chunk each manifest line to chunks of 4 sec or less", + "--create_segments", + help="bool if you would want to segment each manifest line to segments of 4 sec or less", required=False, action='store_true', ) parser.add_argument( "--min_spkrs_count", - default=10, + default=0, type=int, help="min number of samples per speaker to consider and ignore otherwise", ) @@ -247,5 +247,5 @@ def main(filelist, manifest, id, out, split=False, create_chunks=False, min_coun args = parser.parse_args() main( - args.filelist, args.manifest, args.id, args.out, args.split, args.create_chunks, args.min_spkrs_count, + args.filelist, args.manifest, args.id, args.out, args.split, args.create_segments, args.min_spkrs_count, ) diff --git a/scripts/speaker_tasks/pathsfiles_to_manifest.py b/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py similarity index 100% rename from scripts/speaker_tasks/pathsfiles_to_manifest.py rename to scripts/speaker_tasks/pathfiles_to_diarize_manifest.py diff --git a/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb b/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb index 76dbf7bd12e1..45a3787641b3 100644 --- a/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb +++ b/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb @@ -235,7 +235,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lets create a manifest file with the an4 audio and rttm available. If you have more than one file you may also use the script `NeMo/scripts/speaker_tasks/pathsfiles_to_manifest.py` to generate a manifest file from a list of audio files. In addition, you can optionally include rttm files to evaluate the diarization results." + "Lets create a manifest file with the an4 audio and rttm available. If you have more than one file you may also use the script `NeMo/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py` to generate a manifest file from a list of audio files. In addition, you can optionally include rttm files to evaluate the diarization results." ] }, { @@ -663,4 +663,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb b/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb index d3671f5ff776..02fd31c02b71 100644 --- a/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb +++ b/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb @@ -169,7 +169,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lets create manifest with the an4 audio and rttm available. If you have more than one files you may also use the script `pathsfiles_to_manifest.py` to generate manifest file from list of audio files and optionally rttm files " + "Lets create manifest with the an4 audio and rttm available. If you have more than one files you may also use the script `pathfiles_to_diarize_manifest.py` to generate manifest file from list of audio files and optionally rttm files " ] }, { @@ -593,4 +593,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb index f2d0a45327a2..0c1b3213987a 100644 --- a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb +++ b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb @@ -114,7 +114,7 @@ "source": [ "Since an4 is not designed for speaker recognition, this facilitates the opportunity to demonstrate how you can generate manifest files that are necessary for training. These methods can be applied to any dataset to get similar training manifest files. \n", "\n", - "First get an scp file(s) which has all the wav files with absolute paths for each of the train, dev, and test set. This can be easily done by the `find` bash command" + "First, create a list file which has all the wav files with absolute paths for each of the train, dev, and test set. This can be easily done by the `find` bash command" ] }, { @@ -127,7 +127,7 @@ }, "outputs": [], "source": [ - "!find {data_dir}/an4/wav/an4_clstk -iname \"*.wav\" > data/an4/wav/an4_clstk/train_all.scp" + "!find {data_dir}/an4/wav/an4_clstk -iname \"*.wav\" > data/an4/wav/an4_clstk/train_all.txt" ] }, { @@ -137,7 +137,7 @@ "id": "BhWVg2QoDhL3" }, "source": [ - "Let's look at the first 3 lines of scp file for train." + "Let's look at the first 3 lines of text file for train." ] }, { @@ -150,7 +150,7 @@ }, "outputs": [], "source": [ - "!head -n 3 {data_dir}/an4/wav/an4_clstk/train_all.scp" + "!head -n 3 {data_dir}/an4/wav/an4_clstk/train_all.txt" ] }, { @@ -160,7 +160,7 @@ "id": "Y9L9Tl0XDw5Z" }, "source": [ - "Since we created the scp file for the train, we use `scp_to_manifest.py` to convert this scp file to a manifest file and then optionally split the files to train \\& dev for evaluating the models while training by using the `--split` flag. We wouldn't be needing the `--split` option for the test folder. \n", + "Since we created the list text file for the train, we use `filelist_to_manifest.py` to convert this text file to a manifest file and then optionally split the files to train \\& dev for evaluating the models during training by using the `--split` flag. We wouldn't be needing the `--split` option for the test folder. \n", "Accordingly please mention the `id` number, which is the field num separated by `/` to be considered as the speaker label " ] }, @@ -195,8 +195,8 @@ "if not os.path.exists('scripts'):\n", " print(\"Downloading necessary scripts\")\n", " !mkdir -p scripts/speaker_tasks\n", - " !wget -P scripts/speaker_tasks/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/scripts/speaker_tasks/scp_to_manifest.py\n", - "!python {NEMO_ROOT}/scripts/speaker_tasks/scp_to_manifest.py --scp {data_dir}/an4/wav/an4_clstk/train_all.scp --id -2 --out {data_dir}/an4/wav/an4_clstk/all_manifest.json --split" + " !wget -P scripts/speaker_tasks/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/scripts/speaker_tasks/filelist_to_manifest.py\n", + "!python {NEMO_ROOT}/scripts/speaker_tasks/filelist_to_manifest.py --filelist {data_dir}/an4/wav/an4_clstk/train_all.txt --id -2 --out {data_dir}/an4/wav/an4_clstk/all_manifest.json --split" ] }, { @@ -206,7 +206,7 @@ "id": "5kPCmx5DHvY5" }, "source": [ - "Generate the scp for the test folder and then convert it to a manifest." + "Generate the list text file for the test folder and then convert it to a manifest." ] }, { @@ -219,8 +219,8 @@ }, "outputs": [], "source": [ - "!find {data_dir}/an4/wav/an4test_clstk -iname \"*.wav\" > {data_dir}/an4/wav/an4test_clstk/test_all.scp\n", - "!python {NEMO_ROOT}/scripts/speaker_tasks/scp_to_manifest.py --scp {data_dir}/an4/wav/an4test_clstk/test_all.scp --id -2 --out {data_dir}/an4/wav/an4test_clstk/test.json" + "!find {data_dir}/an4/wav/an4test_clstk -iname \"*.wav\" > {data_dir}/an4/wav/an4test_clstk/test_all.txt\n", + "!python {NEMO_ROOT}/scripts/speaker_tasks/filelist_to_manifest.py --filelist {data_dir}/an4/wav/an4test_clstk/test_all.txt --id -2 --out {data_dir}/an4/wav/an4test_clstk/test.json" ] }, { @@ -1264,4 +1264,4 @@ }, "nbformat": 4, "nbformat_minor": 1 -} \ No newline at end of file +} From 59a8c0aceba1c08019c8075daa9aeb954d94126d Mon Sep 17 00:00:00 2001 From: nithinraok Date: Thu, 12 May 2022 23:18:56 -0700 Subject: [PATCH 2/3] chunks -> segments Signed-off-by: nithinraok --- examples/speaker_tasks/recognition/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/speaker_tasks/recognition/README.md b/examples/speaker_tasks/recognition/README.md index fe96c33a5ad0..459fc77d4b55 100644 --- a/examples/speaker_tasks/recognition/README.md +++ b/examples/speaker_tasks/recognition/README.md @@ -97,9 +97,9 @@ find -iname '*.wav' > voxceleb2_dev.txt cat voxceleb1_dev.txt voxceleb2_dev.txt > voxceleb12.txt ``` -This list file is now used to generate training and validation manifest files using a script provided in `/scripts/speaker_tasks/`. This script has optional arguments to split the whole manifest file in to train and dev and also chunk audio files to smaller chunks for robust training (for testing, we don't need this). +This list file is now used to generate training and validation manifest files using a script provided in `/scripts/speaker_tasks/`. This script has optional arguments to split the whole manifest file in to train and dev and also chunk audio files to smaller segments for robust training (for testing, we don't need this). ```bash -python /scripts/speaker_tasks/filelist_to_manifest.py --filelist voxceleb12.txt --id -3 --out voxceleb12_manifest.json --split --create_chunks +python /scripts/speaker_tasks/filelist_to_manifest.py --filelist voxceleb12.txt --id -3 --out voxceleb12_manifest.json --split --create_segments ``` This creates `train.json, dev.json` in the current working directory. From 55e6264613027d4633b60ece9f2a09c30e2cfeca Mon Sep 17 00:00:00 2001 From: nithinraok Date: Fri, 13 May 2022 11:12:52 -0700 Subject: [PATCH 3/3] Khz -> kHz Signed-off-by: nithinraok --- scripts/speaker_tasks/filelist_to_manifest.py | 2 +- .../speaker_tasks/Speaker_Identification_Verification.ipynb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/speaker_tasks/filelist_to_manifest.py b/scripts/speaker_tasks/filelist_to_manifest.py index 7f84b39c6053..3a6c27d39377 100644 --- a/scripts/speaker_tasks/filelist_to_manifest.py +++ b/scripts/speaker_tasks/filelist_to_manifest.py @@ -33,7 +33,7 @@ training, also optionally segment an audio file in to segments of random DURATIONS and create those wav files in CWD. -While creating segments, if audio is not sampled at 16Khz, it resamples to 16Khz and write the wav file. +While creating segments, if audio is not sampled at 16kHz, it resamples to 16kHz and write the wav file. Args: --filelist: path to file containing list of audio files --manifest(optional): if you already have manifest file, but would like to process it for creating diff --git a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb index 0c1b3213987a..2f81df174b17 100644 --- a/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb +++ b/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb @@ -137,7 +137,7 @@ "id": "BhWVg2QoDhL3" }, "source": [ - "Let's look at the first 3 lines of text file for train." + "Let's look at the first 3 lines of filelist text file for train." ] }, {