Merge r1.7.0 main (#3773)

* Tn bug 1.7.0 (#3730) * fix es and fr bug Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add file Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [TTS] Fix bugs in E2E TTS, Mixer-TTS and FastPitch (#3740) * fix bugs Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com> * fix bug in e2e tts and mixer tts Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com> * Mirror AN4 data while servers are down (#3743) Signed-off-by: smajumdar <titu1994@gmail.com> * Bugfix for GPT eval (#3744) * use tokens_cut not tokens Signed-off-by: ericharper <complex451@gmail.com> * remove precision conversion and comment jit for bias gelu Signed-off-by: ericharper <complex451@gmail.com> * revert comment update mbs in config Signed-off-by: ericharper <complex451@gmail.com> * calculate micro_batch_size during complete and compute_logprobs Signed-off-by: ericharper <complex451@gmail.com> * ASR SSL update (#3746) * ssl update Signed-off-by: sam1373 <samuelkriman@gmail.com> * tutorial update Signed-off-by: sam1373 <samuelkriman@gmail.com> * Fix SSL configs for 1.7 (#3748) * ssl update Signed-off-by: sam1373 <samuelkriman@gmail.com> * tutorial update Signed-off-by: sam1373 <samuelkriman@gmail.com> * revert configs Signed-off-by: sam1373 <samuelkriman@gmail.com> * revert configs Signed-off-by: sam1373 <samuelkriman@gmail.com> * punct process bug fix (#3747) Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * updated conformer models. (#3741) Signed-off-by: Vahid <vnoroozi@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Yuya/megatron t5 glue eval (#3751) * Add megatron t5 glue eval-only script Signed-off-by: Yu Yao <yuya@nvidia.com> * Update megatron t5 glue eval default configs Signed-off-by: Yu Yao <yuya@nvidia.com> * Update megatron t5 glue eval configs Signed-off-by: Yu Yao <yuya@nvidia.com> * Update config comments Signed-off-by: Yu Yao <yuya@nvidia.com> Co-authored-by: Yu Yao <yuya@nvidia.com> * Specify gpus in SSL notebook (#3753) * ssl update Signed-off-by: sam1373 <samuelkriman@gmail.com> * tutorial update Signed-off-by: sam1373 <samuelkriman@gmail.com> * revert configs Signed-off-by: sam1373 <samuelkriman@gmail.com> * revert configs Signed-off-by: sam1373 <samuelkriman@gmail.com> * specify gpus Signed-off-by: sam1373 <samuelkriman@gmail.com> * Duplex model inference fix, money encoder fix (#3754) Signed-off-by: ekmb <ebakhturina@nvidia.com> * Update docs for RNNT and overriding fused batch size (#3755) Signed-off-by: smajumdar <titu1994@gmail.com> * fix consumed samples calculation + PTune Model bugs (#3738) * fix the way computing consumed samples Signed-off-by: Yi Dong <yidong@nvidia.com> * fixed ptune model Signed-off-by: Yi Dong <yidong@nvidia.com> * make sure notebook is working Signed-off-by: Yi Dong <yidong@nvidia.com> * added try-catch Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * fix directories in ssl notebook (#3758) * ssl update Signed-off-by: sam1373 <samuelkriman@gmail.com> * tutorial update Signed-off-by: sam1373 <samuelkriman@gmail.com> * revert configs Signed-off-by: sam1373 <samuelkriman@gmail.com> * revert configs Signed-off-by: sam1373 <samuelkriman@gmail.com> * specify gpus Signed-off-by: sam1373 <samuelkriman@gmail.com> * update dirs Signed-off-by: sam1373 <samuelkriman@gmail.com> * TN docs update (#3735) * TN docs update: audio based docs added, quick start, ref fixed, etc Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deployment script dir and Sp TN Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> * Update Tacotron2_Training.ipynb (#3769) Signed-off-by: Jason <jasoli@nvidia.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * update requirements and package info Signed-off-by: ericharper <complex451@gmail.com> * revert Signed-off-by: ericharper <complex451@gmail.com> * revert Signed-off-by: ericharper <complex451@gmail.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * remove unused import Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Samuel Kriman <samuelkriman@gmail.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Yu Yao <yuya@nvidia.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Jason <jasoli@nvidia.com>
NVIDIA · Mar 2, 2022 · 0fcd281 · 0fcd281
1 parent 3bc4725
commit 0fcd281
Show file tree

Hide file tree

Showing 63 changed files with 819 additions and 317 deletions.
diff --git a/docs/source/asr/configs.rst b/docs/source/asr/configs.rst
@@ -671,9 +671,16 @@ The most important component at the top level is the ``strategy``. It can take o
   decoding:
     strategy: "greedy_batch"
 
+    # preserve decoding alignments
+    preserve_alignments: false
+
+    # Overrides the fused batch size after training.
+    # Setting it to -1 will process whole batch at once when combined with `greedy_batch` decoding strategy
+    fused_batch_size: Optional[int] = -1
+
     # greedy strategy config
     greedy:
-      max_symbols: 30
+      max_symbols: 10
 
     # beam strategy config
     beam:

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -113,8 +113,8 @@
 bibtex_bibfiles = [
     'asr/asr_all.bib',
     'nlp/nlp_all.bib',
+    'nlp/text_normalization/tn_itn_all.bib',
     'tools/tools_all.bib',
-    'nemo_text_processing/textprocessing_all.bib',
     'tts_all.bib',
 ]
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -31,13 +31,14 @@ NVIDIA NeMo User Guide
    asr/speaker_diarization/intro
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 3
    :caption: Natural Language Processing
    :name: Natural Language Processing
-
-   nlp/megatron
+
    nlp/models
+   nlp/megatron
    nlp/api
+   nlp/text_normalization/intro
 
 .. toctree::
    :maxdepth: 2
@@ -55,12 +56,6 @@ NVIDIA NeMo User Guide
 
    common/intro
 
-.. toctree::
-   :maxdepth: 2
-   :caption: Text Processing
-   :name: Text Processing
-
-   nemo_text_processing/intro
 
 .. toctree::
    :maxdepth: 2

diff --git a/docs/source/nemo_text_processing/intro.rst b/docs/source/nemo_text_processing/intro.rst
diff --git a/docs/source/nemo_text_processing/text_normalization.rst b/docs/source/nemo_text_processing/text_normalization.rst
diff --git a/docs/source/nlp/models.rst b/docs/source/nlp/models.rst
@@ -21,4 +21,3 @@ NeMo's NLP collection supports provides the following task-specific models:
    entity_linking
    nlp_model
    machine_translation
-   text_normalization
diff --git a/docs/source/nlp/text_normalization/intro.rst b/docs/source/nlp/text_normalization/intro.rst
@@ -0,0 +1,21 @@
+(Inverse) Text Normalization
+============================
+
+NeMo supports Text Normalization (TN) and Inverse Text Normalization (ITN) tasks via rule-based `nemo_text_processing` python package and Neural-based TN/ITN model.
+
+Rule-based (WFST) TN/ITN:
+
+.. toctree::
+   :maxdepth: 1
+
+   wfst/intro
+
+
+Neural TN/ITN:
+
+.. toctree::
+   :maxdepth: 1
+
+   nn_text_normalization
+
+
diff --git a/docs/source/nlp/text_normalization.rst → ...t_normalization/nn_text_normalization.rst b/docs/source/nlp/text_normalization.rst → ...t_normalization/nn_text_normalization.rst
@@ -1,7 +1,7 @@
-.. _text_normalization:
+.. _nn_text_normalization:
 
-Text Normalization Models
-==========================
+Neural Text Normalization Models
+================================
 Text normalization is the task of converting a written text into its spoken form. For example,
 ``$123`` should be verbalized as ``one hundred twenty three dollars``, while ``123 King Ave``
 should be verbalized as ``one twenty three King Avenue``. At the same time, the inverse problem
@@ -279,7 +279,7 @@ The argument ``data.train_ds.decoder_data_augmentation`` in the config file cont
 References
 ----------
 
-.. bibliography:: nlp_all.bib
+.. bibliography:: tn_itn_all.bib
     :style: plain
     :labelprefix: NLP-TEXTNORM
     :keyprefix: nlp-textnorm-
diff --git a/docs/source/nlp/text_normalization/tn_itn_all.bib b/docs/source/nlp/text_normalization/tn_itn_all.bib
@@ -0,0 +1,56 @@
+@article{ebden2015kestrel,
+  title={The Kestrel TTS text normalization system},
+  author={Ebden, Peter and Sproat, Richard},
+  journal={Natural Language Engineering},
+  volume={21},
+  number={3},
+  pages={333},
+  year={2015},
+  publisher={Cambridge University Press}
+}
+
+@article{sproat2016rnn,
+  title={RNN approaches to text normalization: A challenge},
+  author={Sproat, Richard and Jaitly, Navdeep},
+  journal={arXiv preprint arXiv:1611.00068},
+  year={2016}
+}
+
+@book{taylor2009text,
+  title={Text-to-speech synthesis},
+  author={Taylor, Paul},
+  year={2009},
+  publisher={Cambridge university press}
+}
+
+@misc{zhang2021nemo,
+      title={NeMo Inverse Text Normalization: From Development To Production}, 
+      author={Yang Zhang and Evelina Bakhturina and Kyle Gorman and Boris Ginsburg},
+      year={2021},
+      eprint={2104.05055},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+
+@inproceedings{sparrowhawk,
+ title	= {TTS for Low Resource Languages: A Bangla Synthesizer},
+ author	= {Alexander Gutkin and Linne Ha and Martin Jansche and Knot Pipatsrisawat and Richard Sproat},
+ booktitle	= {10th Language Resources and Evaluation Conference},
+ year	= {2016},
+}
+
+@article{mohri2005weighted,
+  title={Weighted automata in text and speech processing},
+  author={Mohri, Mehryar and Pereira, Fernando and Riley, Michael},
+  journal={arXiv preprint cs/0503077},
+  year={2005}
+}
+
+@incollection{mohri2009weighted,
+  title={Weighted automata algorithms},
+  author={Mohri, Mehryar},
+  booktitle={Handbook of weighted automata},
+  pages={213--254},
+  year={2009},
+  publisher={Springer}
+}
diff --git a/docs/source/nlp/text_normalization/wfst/intro.rst b/docs/source/nlp/text_normalization/wfst/intro.rst
@@ -0,0 +1,22 @@
+WFST-based (Inverse) Text Normalization
+=======================================
+
+NeMo supports Text Normalization (TN) and Inverse Text Normalization (ITN) tasks via rule-based `nemo_text_processing` python package and Neural-based TN/ITN model.
+
+`nemo_text_processing` that is installed with the `nemo_toolkit`, see :doc:`NeMo Introduction <../starthere/intro>` for installation details.
+Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/setup.sh>`_.
+
+Tutorials on how to get started with WFST-based NeMo text normalization could be found `tutorials/text_processing <https://github.com/NVIDIA/NeMo/tree/stable/tutorials/text_processing>`_.
+
+Rule-based (WFST) TN/ITN:
+
+.. toctree::
+   :maxdepth: 2
+
+   wfst_text_normalization
+   wfst_inverse_text_normalization
+   wfst_text_processing_deployment
+   wfst_api
+
+
+
diff --git a/docs/source/nemo_text_processing/api.rst → .../nlp/text_normalization/wfst/wfst_api.rst b/docs/source/nemo_text_processing/api.rst → .../nlp/text_normalization/wfst/wfst_api.rst
@@ -1,3 +1,5 @@
+.. _wfst_api:
+
 NeMo Text Processing API
 ========================
 

diff --git a/...processing/inverse_text_normalization.rst → .../wfst/wfst_inverse_text_normalization.rst b/...processing/inverse_text_normalization.rst → .../wfst/wfst_inverse_text_normalization.rst
@@ -1,12 +1,29 @@
+.. _wfst_itn:
+
 Inverse Text Normalization
 ==========================
 
 Inverse text normalization (ITN) is a part of the Automatic Speech Recognition (ASR) post-processing pipeline.
 ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.
 
-For example, 
-`"in nineteen seventy"` -> `"in 1975"` 
-and `"it costs one hundred and twenty three dollars"` -> `"it costs $123"`.
+Quick Start Guide
+-----------------
+
+.. code-block:: python
+
+    # import WFST-based ITN module
+    from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
+
+    # initialize inverse normalizer
+    inverse_normalizer = InverseNormalizer(lang="en")
+
+    # try normalizer on a few examples
+    print(inverse_normalizer.normalize("it costs one hundred and twenty three dollars"))
+    # >>>"it costs $123"
+
+    print(inverse_normalizer.normalize("in nineteen seventy"))
+    # >>> "in 1970"
+
 
 NeMo ITN :cite:`textprocessing-itn-zhang2021nemo` is based on WFST-grammars :cite:`textprocessing-itn-Mohri2009`. We also provide a deployment route to C++ using `Sparrowhawk <https://github.com/google/sparrowhawk>`_ :cite:`textprocessing-itn-sparrowhawk` -- an open-source version of Google Kestrel :cite:`textprocessing-itn-ebden2015kestrel`.
 See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for details.
@@ -17,11 +34,8 @@ See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for d
 
 
 
-
-
-
 Classes
-----------------------------------
+--------
 
 
 The base class for every grammar is :class:`GraphFst<nemo_text_processing.text_normalization.en.GraphFst>`.
@@ -75,13 +89,25 @@ Example evaluation run on (cleaned) `Google's text normalization dataset <https:
 
     python run_evaluation.py  --input=./en_with_types/output-00001-of-00100 <--language LANGUAGE> [--cat CLASS_CATEGORY] [--filter]
 
+Supported Languages
+-------------------
+
+ITN supports: English, Spanish, German, French, Vietnamese, and Russian languages.
+
+Installation
+------------
+
+`nemo_text_processing` is installed with the `nemo_toolkit`.
+
+See :doc:`NeMo Introduction <../starthere/intro>` for installation details.
 
+Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/setup.sh>`_.
 
 
 References
 ----------
 
-.. bibliography:: textprocessing_all.bib
+.. bibliography:: ../tn_itn_all.bib
     :style: plain
     :labelprefix: TEXTPROCESSING-ITN
     :keyprefix: textprocessing-itn-