Skip to content

Commit

Permalink
Tutorial fixes (#6717) (#6718)
Browse files Browse the repository at this point in the history
Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
  • Loading branch information
github-actions[bot] and titu1994 authored May 24, 2023
1 parent 8685468 commit 0150b91
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion tutorials/AudioTranslationSample.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"import nemo\n",
"# Import Speech Recognition collection\n",
"import nemo.collections.asr as nemo_asr\n",
"# Import Natural Language Processing colleciton\n",
"# Import Natural Language Processing collection\n",
"import nemo.collections.nlp as nemo_nlp\n",
"# Import Speech Synthesis collection\n",
"import nemo.collections.tts as nemo_tts\n",
Expand Down
2 changes: 1 addition & 1 deletion tutorials/Publish_NeMo_Model_On_Hugging_Face_Hub.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -627,7 +627,7 @@
"<DECLARE ANY POTENTIAL LIMITATIONS OF THE MODEL>\n",
"\n",
"Eg: \n",
"Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.\n",
"Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.\n",
"\n",
"\n",
"## References\n",
Expand Down
2 changes: 1 addition & 1 deletion tutorials/tools/CTC_Segmentation_Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@
"* `max_length` argument - max number of words in a segment for alignment (used only if there are no punctuation marks present in the original text. Long non-speech segments are better for segments split and are more likely to co-occur with punctuation marks. Random text split could deteriorate the quality of the alignment.\n",
"* out-of-vocabulary words will be removed based on pre-trained ASR model vocabulary, and the text will be changed to lowercase \n",
"* sentences for alignment with the original punctuation and capitalization will be stored under `$OUTPUT_DIR/processed/*_with_punct.txt`\n",
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
"\n",
"### Audio preprocessing:\n",
"* non '.wav' audio files will be converted to `.wav` format\n",
Expand Down

0 comments on commit 0150b91

Please sign in to comment.