merge r1.11 to main (#4920)

* update branch Signed-off-by: ericharper <complex451@gmail.com> * update package info and dockerfile Signed-off-by: ericharper <complex451@gmail.com> * [TTS] bugfix for missing configs. (#4725) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Fix pynini install in TTS tutorials (#4729) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * [TTS] updated config with a German IPA phoneme tokenizer (#4756) * [TTS] added a German IPA phoneme tokenizer * [TTS][ASR] enabled customized arguments for trimming the leading and trailing silence. * [TTS] disabled spline interpolation for beta-binomial distribution. Let it generate align prior and save to disks. Use a new phoneme tokenizer. * [TTS] use consistent spline interpolation with fastpitch checkpoint when generating mel-spectrograms for hifigan finetune. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update r1.11 to new heteronyms list (#4745) * Update configs to new heteronyms list * Remove old heteronyms list, add alt 'merchandise' pron to CMUdict * Update remaining references to old heteronyms list Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Fix tutorial formatting (#4778) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * update branch and typos (#4788) Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> * Adding support for models trained with full context for cache-aware streaming. (#4687) * added support for models trained with full context. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed style. Signed-off-by: Vahid <vnoroozi@nvidia.com> * dropped seq_range Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed indexing in caching methods. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed code style. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed code style. Signed-off-by: Vahid <vnoroozi@nvidia.com> * updated docs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * addressed comments. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed code style. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed code style. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed code style. Signed-off-by: Vahid <vnoroozi@nvidia.com> * change frame-wise to cache-aware. Signed-off-by: Vahid <vnoroozi@nvidia.com> * change frame-wise to cache-aware. Signed-off-by: Vahid <vnoroozi@nvidia.com> * change frame-wise to cache-aware. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed code style. Signed-off-by: Vahid <vnoroozi@nvidia.com> Signed-off-by: Vahid <vnoroozi@nvidia.com> * Update megatron encoder decoder model to support py37 for colab (#4791) * [ASR] Add pretrained ASR models for Croatian (#4682) * [ASR] Add pretrained ASR models for Croatian Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Fix style for import Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Co-authored-by: Ante Jukić <ajukic@nvidia.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * added/fixed export for Megatron models (#4712) * added/fixed export for Megatron models Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> * fixed style Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> * fixed FusedScaleMaskSoftmax in BioMegatron Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> * included comments Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * update branch for qa notebook Signed-off-by: ericharper <complex451@gmail.com> * Fix initializing weights from ptl ckpt with exclude (#4807) Signed-off-by: sam1373 <samuelkriman@gmail.com> Signed-off-by: sam1373 <samuelkriman@gmail.com> * Fix index error from addition of voiced_mask and p_voiced (#4811) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * T5 prompt learning fixes (#4771) * RPE, hidden size and config fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update to reflect new config names Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Sentencepiece fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix finetuning Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add encoder seq len to gpt Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add finetune eval script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix name Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update Jenkinsfile Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update config Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Backward compat Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Split rank for Enc-Dec models Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Address comments Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> * G2P docs (#4841) * g2p docs added Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix references Signed-off-by: ekmb <ebakhturina@nvidia.com> * address review feedback Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: ekmb <ebakhturina@nvidia.com> * Fix providing glue in seq2seq eval (#4843) * Fix providing glue in seq2seq eval Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Updated inference code and squad scripts (#4835) * Updated inference code and squad scripts Signed-off-by: Virginia Adams <vadams@nvidia.com> * Reverted GPT & T5 inference files back to use NLPDDPlugin Signed-off-by: Virginia Adams <vadams@nvidia.com> * Overwrite frozen LM to use fused adam Signed-off-by: Virginia Adams <vadams@nvidia.com> * Added padded vocab size Signed-off-by: Virginia Adams <vadams@nvidia.com> * Fixed val check interval value Signed-off-by: Virginia Adams <vadams@nvidia.com> * Python format fix Signed-off-by: Virginia Adams <vadams@nvidia.com> * Make t5 prompt learning preds write to file Signed-off-by: Virginia Adams <vadams@nvidia.com> * Added back dp=1 check Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> * Set the number of workers to 0 for validation and test sets in all enc-dec models (#4790) * Set workers to 0 for validation and test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Revert pin memory Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Fix Megatron NMT consumed samples and ckpt_to_nemo split rank (#4884) * Fix nmt and ckpt_to_nemo Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * added utf8 encoding (#4892) Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> * update readme with apex commit Signed-off-by: ericharper <complex451@gmail.com> * Add support for Apex distributed Adam optimizer with GPT-3 (#4487) * Add support for Apex distributed Adam optimizer with GPT-3 Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug in grad clipping with dist Adam Grad norm was computed over all params, not respecting model parallelism. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug with DDP initialization Signed-off-by: Tim Moon <tmoon@nvidia.com> * Make distopt dependent on megatron_amp_o2 Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix code formatting Signed-off-by: Tim Moon <tmoon@nvidia.com> * Handle dist Adam in optimizer unit tests Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * update readme Signed-off-by: ericharper <complex451@gmail.com> * update readme Signed-off-by: ericharper <complex451@gmail.com> * fixed styles Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * removed unsued import. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * removed duplicated func defintion. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * replace 'r1.11.0' with 'main' in Jenkinsfile and all tutorials. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix: PRE_RELEASE = 'rc0' Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * replace branch name to main for asr_with_adapters.ipynb. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix Fastpitch mixertts tutorial format to align with main to distingshuish diff Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix: correct path for tokenizers. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Vahid <vnoroozi@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Signed-off-by: sam1373 <samuelkriman@gmail.com> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: ericharper <complex451@gmail.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Co-authored-by: Ante Jukić <ajukic@nvidia.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: David <amosalla@asu.edu> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: Samuel Kriman <samuelkriman@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
NVIDIA · Sep 13, 2022 · cf83c5f · cf83c5f
1 parent 4f6aba1
commit cf83c5f
Show file tree

Hide file tree

Showing 68 changed files with 1,533 additions and 359 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -3146,8 +3146,10 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
                 inference.add_BOS=False \
                 trainer.devices=2 \
                 tensor_model_parallel_size=2 \
+                pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt \
                 data_paths=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl']"
             sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo"
+            sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt"
           }
         }
         stage('GPT Prompt Learning TP=1 PP=2') {
@@ -3173,8 +3175,10 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
                 inference.add_BOS=False \
                 trainer.devices=2 \
                 pipeline_model_parallel_size=2 \
+                pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt \
                 data_paths=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl']"
             sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo"
+            sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt"
           }
         }
       }
@@ -3433,7 +3437,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
                 trainer.max_steps=6 \
                 trainer.max_epochs=null \
                 model.tensor_model_parallel_size=1 \
-                model.pretrained_language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
+                model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
                 model.existing_tasks=[] \
                 model.new_tasks=['squad'] \
                 model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
@@ -3443,11 +3447,13 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
             sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test"
             sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \
                 virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test.nemo' \
-                pretrained_language_model_file='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
+                language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
                 data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
+                pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt' \
                 data.global_batch_size=4 \
                 data.micro_batch_size=4"
             sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test.nemo"
+            sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt"
           }
         }
         stage('T5 Prompt Learning TP=2 PP=1') {
@@ -3459,7 +3465,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
                 trainer.max_steps=6 \
                 trainer.max_epochs=null \
                 model.tensor_model_parallel_size=2 \
-                model.pretrained_language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
+                model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
                 model.existing_tasks=[] \
                 model.new_tasks=['squad'] \
                 model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
@@ -3469,13 +3475,15 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
             sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2"
             sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \
                 virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo' \
-                pretrained_language_model_file='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
+                language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
                 data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
+                pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt' \
                 tensor_model_parallel_size=2 \
                 trainer.devices=2 \
                 data.global_batch_size=8 \
                 data.micro_batch_size=8"
             sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo"
+            sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt"
           }
         }
       }

diff --git a/README.rst b/README.rst
@@ -200,16 +200,10 @@ Megatron GPT training requires NVIDIA Apex to be installed.
 
 .. code-block:: bash
 
-    git clone https://github.com/NVIDIA/apex
+    git clone https://github.com/ericharper/apex.git
     cd apex
-    git checkout 3c19f1061879394f28272a99a7ea26d58f72dace
-    pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" ./
-
-.. note::
-
-  You may need to modify [setup.py](https://github.com/NVIDIA/apex/blob/3c19f1061879394f28272a99a7ea26d58f72dace/setup.py) if 
-  your version of CUDA does not match the version used to compile Pytorch binaries, comment lines 33-41 in the above link
-  before installing.
+    git checkout nm_v1.11.0
+    pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
 
 Docker containers:
 ~~~~~~~~~~~~~~~~~~

diff --git a/docs/source/asr/data/benchmark_hr.csv b/docs/source/asr/data/benchmark_hr.csv
@@ -0,0 +1,3 @@
+Model,Model Base Class,Model Card
+stt_hr_conformer_ctc_large,EncDecCTCModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_hr_conformer_ctc_large"
+stt_hr_conformer_transducer_large,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_hr_conformer_transducer_large"
diff --git a/docs/source/asr/data/scores/hr/conformer_hr.csv b/docs/source/asr/data/scores/hr/conformer_hr.csv
@@ -0,0 +1,3 @@
+Model Name,Language,ParlaSpeech-HR v1.0 (dev),ParlaSpeech-HR v1.0 (test)
+stt_hr_conformer_ctc_large,hr,4.43,4.70
+stt_hr_conformer_transducer_large,hr,4.56,4.69
diff --git a/docs/source/asr/datasets.rst b/docs/source/asr/datasets.rst
@@ -171,7 +171,7 @@ The audio files can be of any format supported by `Pydub <https://github.com/jia
 WAV files as they are the default and have been most thoroughly tested.
 
 There should be one manifest file per dataset that will be passed in, therefore, if the user wants separate training and validation
-datasets, they should also have separate manifests. Otherwise, thay will be loading validation data with their training data and vice
+datasets, they should also have separate manifests. Otherwise, they will be loading validation data with their training data and vice
 versa.
 
 Each line of the manifest should be in the following format:

diff --git a/docs/source/asr/models.rst b/docs/source/asr/models.rst
@@ -132,15 +132,17 @@ Cache-aware Streaming Conformer
 
 Buffered streaming uses overlapping chunks to make an offline ASR model to be used for streaming with reasonable accuracy. However, it uses significant amount of duplication in computations due to the overlapping chunks.
 Also there is a accuracy gep between the offline model and the streaming one as there is inconsistency between how we train the model and how we perform inference for streaming.
-The Cache-aware Streaming Conformer models would tackle and address these disadvantages. They are variants of Conformer which are trained with limited right context and it would make it possible to match the training and inference.
+The Cache-aware Streaming Conformer models would tackle and address these disadvantages. These streaming Conformers are trained with limited right context that it would make it possible to match how the model is being used in both the training and inference.
+They also uses caching to store intermediate activations to avoid any duplication in compute.
 The cache-aware approach is supported for both the Conformer-CTC and Conformer-Transducer and enables the model to be used very efficiently for streaming.
 
-Three categories of layers in Conformer have access to right tokens: 1-depthwise convolutions 2-self-attention, and 3-convolutions in downsampling layers.
+Three categories of layers in Conformer have access to right tokens: 1-depthwise convolutions 2-self-attention, and 3-convolutions in the downsampling layers.
 Streaming Conformer models uses causal convolutions or convolutions with lower right context and also self-attention with limited right context to limit the effective right context for the input.
-The model trained with such limitations can be used in streaming mode and give the exact same output and accuracy as when the whole audio is given to the model in offline mode.
+The model trained with such limitations can be used in streaming mode and give the exact same outputs and accuracy as when the whole audio is given to the model in offline mode.
 These model can use caching mechanism to store and reuse the activations during streaming inference to avoid any duplications in the computations as much as possible.
 
 We support the following three right context modeling:
+
 *  fully causal model with zero look-ahead: tokens would not see any future tokens. convolution layers are all causal and right tokens are masked for self-attention.
 It gives zero latency but with limited accuracy.
 To train such a model, you need to set `encoder.att_context_size=[left_context, 0]` and `encoder.conv_context_size=causal` in the config.
@@ -155,9 +157,9 @@ This approach is more efficient than regular look-ahead in terms of computations
 In terms of accuracy, this approach gives similar or even better results in term of accuracy than regular look-ahead as each token in each layer have access to more tokens on average. That is why we recommend to use this approach for streaming.
 
 
-** Note: Latencies are based on the assumption that the forward time of the network is zero.
+** Note: Latencies are based on the assumption that the forward time of the network is zero and it just estimates the time needed after a frame would be available until it is passed through the model.
 
-Approaches with non-zero look-ahead can give significantly better accuracy by sacrificing latency. The latency can get controlled by the left context size.
+Approaches with non-zero look-ahead can give significantly better accuracy by sacrificing latency. The latency can get controlled by the left context size. Increasing the right context would help the accuracy to a limit but would increase the compuation time.
 
 
 In all modes, left context can be controlled by the number of tokens to be visible in the self-attention and the kernel size of the convolutions.
@@ -168,12 +170,16 @@ Left context of convolutions is dependent to the their kernel size while it can
 Self-attention left context of around 6 secs would give close result to have unlimited left context. For a model with 4x downsampling and shift window of 10ms in the preprocessor, each token corresponds to 4*10=40ms.
 
 If striding approach is used for downsampling, all the convolutions in downsampling would be fully causal and don't see future tokens.
-It is recommended to use stacking for streaming model which is significantly faster and uses less memory.
+You may use stacking for downsampling in the streaming models which is significantly faster and uses less memory.
+It also does not some of the the limitations with striding and vggnet and you may use any downsampling rate.
 
 You may find the example config files of cache-aware streaming Conformer models at
 ``<NeMo_git_root>/examples/asr/conf/conformer/streaming/conformer_transducer_bpe_streaming.yaml`` for Transducer variant and
 at ``<NeMo_git_root>/examples/asr/conf/conformer/streaming/conformer_ctc_bpe.yaml`` for CTC variant.
 
+To simulate cache-aware stremaing, you may use the script at ``<NeMo_git_root>/examples/asr/asr_streaming/speech_to_text_streaming_infer.py``. It can simulate streaming in single stream or multi-stream mode (in batches) for an ASR model.
+This script can be used for models trained offline with full-context but the accuracy would not be great unless the chunk size is large enough which would result in high latency.
+It is recommended to train a model in streaming model with limited context for this script. More info can be found in the script.
 
 .. _LSTM-Transducer_model:
 

diff --git a/docs/source/asr/scores.rst b/docs/source/asr/scores.rst
@@ -169,6 +169,16 @@ FR
 
 --------------------
 
+HR
+^^
+
+.. csv-table::
+    :header-rows: 1
+    :align: left
+    :file: data/scores/hr/conformer_hr.csv
+
+--------------------
+
 IT
 ^^
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -120,6 +120,7 @@
     'nlp/text_normalization/tn_itn_all.bib',
     'tools/tools_all.bib',
     'tts_all.bib',
+    'text_processing/text_processing_all.bib',
     'core/adapters/adapter_bib.bib',
 ]
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -44,6 +44,7 @@ NVIDIA NeMo User Guide
    nlp/machine_translation/machine_translation
    nlp/text_normalization/intro
    nlp/api
+   nlp/models
 
 
 .. toctree::
@@ -60,6 +61,14 @@ NVIDIA NeMo User Guide
    :caption: Common
    :name: Common
 
+   text_processing/intro
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Text Processing
+   :name: Text Processing
+
+   text_processing/g2p/g2p
    common/intro