Skip to content

Commit

Permalink
merge r1.11 to main (NVIDIA#4920)
Browse files Browse the repository at this point in the history
* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* update package info and dockerfile

Signed-off-by: ericharper <complex451@gmail.com>

* [TTS] bugfix for missing configs. (NVIDIA#4725)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix pynini install in TTS tutorials (NVIDIA#4729)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* [TTS] updated config with a German IPA phoneme tokenizer (NVIDIA#4756)

* [TTS] added a German IPA phoneme tokenizer
* [TTS][ASR] enabled customized arguments for trimming the leading and trailing silence.
* [TTS] disabled spline interpolation for beta-binomial distribution. Let it generate align prior and save to disks. Use a new phoneme tokenizer.
* [TTS] use consistent spline interpolation with fastpitch checkpoint when generating mel-spectrograms for hifigan finetune.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Update r1.11 to new heteronyms list (NVIDIA#4745)

* Update configs to new heteronyms list
* Remove old heteronyms list, add alt 'merchandise' pron to CMUdict
* Update remaining references to old heteronyms list

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix tutorial formatting (NVIDIA#4778)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* update branch and typos (NVIDIA#4788)

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>

* Adding support for models trained with full context for cache-aware streaming. (NVIDIA#4687)

* added support for models trained with full context.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed style.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* dropped seq_range

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed indexing in caching methods.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed code style.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed code style.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed code style.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed code style.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed code style.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* change frame-wise to cache-aware.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* change frame-wise to cache-aware.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* change frame-wise to cache-aware.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed code style.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Update megatron encoder decoder model to support py37 for colab (NVIDIA#4791)

* [ASR] Add pretrained ASR models for Croatian (NVIDIA#4682)

* [ASR] Add pretrained ASR models for Croatian

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fix style for import

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* added/fixed export for Megatron models (NVIDIA#4712)

* added/fixed export for Megatron models

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* fixed style

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* fixed FusedScaleMaskSoftmax in BioMegatron

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* included comments

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* update branch for qa notebook

Signed-off-by: ericharper <complex451@gmail.com>

* Fix initializing weights from ptl ckpt with exclude (NVIDIA#4807)

Signed-off-by: sam1373 <samuelkriman@gmail.com>

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* Fix index error from addition of voiced_mask and p_voiced (NVIDIA#4811)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* T5 prompt learning fixes (NVIDIA#4771)

* RPE, hidden size and config fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update to reflect new config names

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Sentencepiece fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix finetuning

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add encoder seq len to gpt

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add finetune eval script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix name

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update Jenkinsfile

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update check

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Backward compat

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Split rank for Enc-Dec models

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Address comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* G2P docs (NVIDIA#4841)

* g2p docs added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix references

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* address review feedback

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* Fix providing glue in seq2seq eval (NVIDIA#4843)

* Fix providing glue in seq2seq eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Updated inference code and squad scripts (NVIDIA#4835)

* Updated inference code and squad scripts

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Reverted GPT & T5 inference files back to use NLPDDPlugin

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Overwrite frozen LM to use fused adam

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Added padded vocab size

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Fixed val check interval value

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Python format fix

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Make t5 prompt learning preds write to file

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* Added back dp=1 check

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Set the number of workers to 0 for validation and test sets in all enc-dec models (NVIDIA#4790)

* Set workers to 0 for validation and test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Revert pin memory

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Fix Megatron NMT consumed samples and ckpt_to_nemo split rank (NVIDIA#4884)

* Fix nmt and ckpt_to_nemo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* added utf8 encoding (NVIDIA#4892)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* update readme with apex commit

Signed-off-by: ericharper <complex451@gmail.com>

* Add support for Apex distributed Adam optimizer with GPT-3 (NVIDIA#4487)

* Add support for Apex distributed Adam optimizer with GPT-3

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix bug in grad clipping with dist Adam

Grad norm was computed over all params, not respecting model parallelism.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix bug with DDP initialization

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Make distopt dependent on megatron_amp_o2

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix code formatting

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Handle dist Adam in optimizer unit tests

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* update readme

Signed-off-by: ericharper <complex451@gmail.com>

* update readme

Signed-off-by: ericharper <complex451@gmail.com>

* fixed styles

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* removed unsued import.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* removed duplicated func defintion.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* replace 'r1.11.0' with 'main' in Jenkinsfile and all tutorials.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* fix: PRE_RELEASE = 'rc0'

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* replace branch name to main for asr_with_adapters.ipynb.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* fix Fastpitch mixertts tutorial format to align with main to distingshuish diff

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* fix: correct path for tokenizers.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: ericharper <complex451@gmail.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
  • Loading branch information
17 people authored and jubick1337 committed Sep 28, 2022
1 parent 8dfe123 commit d97bed9
Show file tree
Hide file tree
Showing 68 changed files with 1,533 additions and 359 deletions.
16 changes: 12 additions & 4 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -3146,8 +3146,10 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
inference.add_BOS=False \
trainer.devices=2 \
tensor_model_parallel_size=2 \
pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt \
data_paths=['/home/TestData/nlp/prompt_learning/rte_CI_test.jsonl']"
sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp.nemo"
sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_tp_preds.txt"
}
}
stage('GPT Prompt Learning TP=1 PP=2') {
Expand All @@ -3173,8 +3175,10 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
inference.add_BOS=False \
trainer.devices=2 \
pipeline_model_parallel_size=2 \
pred_file_path=/home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt \
data_paths=['/home/TestData/nlp/prompt_learning/boolq_CI_test.jsonl']"
sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp.nemo"
sh "rm -rf /home/TestData/nlp/prompt_learning/p_tuning_test_pp_preds.txt"
}
}
}
Expand Down Expand Up @@ -3433,7 +3437,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
trainer.max_steps=6 \
trainer.max_epochs=null \
model.tensor_model_parallel_size=1 \
model.pretrained_language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
model.existing_tasks=[] \
model.new_tasks=['squad'] \
model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
Expand All @@ -3443,11 +3447,13 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test"
sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \
virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test.nemo' \
pretrained_language_model_file='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m-refactor.nemo' \
data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt' \
data.global_batch_size=4 \
data.micro_batch_size=4"
sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test.nemo"
sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_preds.txt"
}
}
stage('T5 Prompt Learning TP=2 PP=1') {
Expand All @@ -3459,7 +3465,7 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
trainer.max_steps=6 \
trainer.max_epochs=null \
model.tensor_model_parallel_size=2 \
model.pretrained_language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
model.language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
model.existing_tasks=[] \
model.new_tasks=['squad'] \
model.data.train_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
Expand All @@ -3469,13 +3475,15 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2"
sh "python examples/nlp/language_modeling/megatron_t5_prompt_learning_eval.py \
virtual_prompt_model_file='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo' \
pretrained_language_model_file='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
language_model_path='/home/TestData/nlp/megatron_t5/8m/megatron_t5_8m_tp2.nemo' \
data.test_ds=['/home/TestData/nlp/prompt_learning/squad_CI_test.jsonl'] \
pred_file_path='/home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt' \
tensor_model_parallel_size=2 \
trainer.devices=2 \
data.global_batch_size=8 \
data.micro_batch_size=8"
sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2.nemo"
sh "rm -rf /home/TestData/nlp/prompt_learning/t5_p_tuning_test_tp2_preds.txt"
}
}
}
Expand Down
12 changes: 3 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,16 +200,10 @@ Megatron GPT training requires NVIDIA Apex to be installed.

.. code-block:: bash
git clone https://github.com/NVIDIA/apex
git clone https://github.com/ericharper/apex.git
cd apex
git checkout 3c19f1061879394f28272a99a7ea26d58f72dace
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" ./
.. note::

You may need to modify [setup.py](https://github.com/NVIDIA/apex/blob/3c19f1061879394f28272a99a7ea26d58f72dace/setup.py) if
your version of CUDA does not match the version used to compile Pytorch binaries, comment lines 33-41 in the above link
before installing.
git checkout nm_v1.11.0
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
Docker containers:
~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 3 additions & 0 deletions docs/source/asr/data/benchmark_hr.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Model,Model Base Class,Model Card
stt_hr_conformer_ctc_large,EncDecCTCModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_hr_conformer_ctc_large"
stt_hr_conformer_transducer_large,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_hr_conformer_transducer_large"
3 changes: 3 additions & 0 deletions docs/source/asr/data/scores/hr/conformer_hr.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Model Name,Language,ParlaSpeech-HR v1.0 (dev),ParlaSpeech-HR v1.0 (test)
stt_hr_conformer_ctc_large,hr,4.43,4.70
stt_hr_conformer_transducer_large,hr,4.56,4.69
2 changes: 1 addition & 1 deletion docs/source/asr/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ The audio files can be of any format supported by `Pydub <https://github.com/jia
WAV files as they are the default and have been most thoroughly tested.

There should be one manifest file per dataset that will be passed in, therefore, if the user wants separate training and validation
datasets, they should also have separate manifests. Otherwise, thay will be loading validation data with their training data and vice
datasets, they should also have separate manifests. Otherwise, they will be loading validation data with their training data and vice
versa.

Each line of the manifest should be in the following format:
Expand Down
18 changes: 12 additions & 6 deletions docs/source/asr/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,15 +132,17 @@ Cache-aware Streaming Conformer

Buffered streaming uses overlapping chunks to make an offline ASR model to be used for streaming with reasonable accuracy. However, it uses significant amount of duplication in computations due to the overlapping chunks.
Also there is a accuracy gep between the offline model and the streaming one as there is inconsistency between how we train the model and how we perform inference for streaming.
The Cache-aware Streaming Conformer models would tackle and address these disadvantages. They are variants of Conformer which are trained with limited right context and it would make it possible to match the training and inference.
The Cache-aware Streaming Conformer models would tackle and address these disadvantages. These streaming Conformers are trained with limited right context that it would make it possible to match how the model is being used in both the training and inference.
They also uses caching to store intermediate activations to avoid any duplication in compute.
The cache-aware approach is supported for both the Conformer-CTC and Conformer-Transducer and enables the model to be used very efficiently for streaming.

Three categories of layers in Conformer have access to right tokens: 1-depthwise convolutions 2-self-attention, and 3-convolutions in downsampling layers.
Three categories of layers in Conformer have access to right tokens: 1-depthwise convolutions 2-self-attention, and 3-convolutions in the downsampling layers.
Streaming Conformer models uses causal convolutions or convolutions with lower right context and also self-attention with limited right context to limit the effective right context for the input.
The model trained with such limitations can be used in streaming mode and give the exact same output and accuracy as when the whole audio is given to the model in offline mode.
The model trained with such limitations can be used in streaming mode and give the exact same outputs and accuracy as when the whole audio is given to the model in offline mode.
These model can use caching mechanism to store and reuse the activations during streaming inference to avoid any duplications in the computations as much as possible.

We support the following three right context modeling:

* fully causal model with zero look-ahead: tokens would not see any future tokens. convolution layers are all causal and right tokens are masked for self-attention.
It gives zero latency but with limited accuracy.
To train such a model, you need to set `encoder.att_context_size=[left_context, 0]` and `encoder.conv_context_size=causal` in the config.
Expand All @@ -155,9 +157,9 @@ This approach is more efficient than regular look-ahead in terms of computations
In terms of accuracy, this approach gives similar or even better results in term of accuracy than regular look-ahead as each token in each layer have access to more tokens on average. That is why we recommend to use this approach for streaming.


** Note: Latencies are based on the assumption that the forward time of the network is zero.
** Note: Latencies are based on the assumption that the forward time of the network is zero and it just estimates the time needed after a frame would be available until it is passed through the model.

Approaches with non-zero look-ahead can give significantly better accuracy by sacrificing latency. The latency can get controlled by the left context size.
Approaches with non-zero look-ahead can give significantly better accuracy by sacrificing latency. The latency can get controlled by the left context size. Increasing the right context would help the accuracy to a limit but would increase the compuation time.


In all modes, left context can be controlled by the number of tokens to be visible in the self-attention and the kernel size of the convolutions.
Expand All @@ -168,12 +170,16 @@ Left context of convolutions is dependent to the their kernel size while it can
Self-attention left context of around 6 secs would give close result to have unlimited left context. For a model with 4x downsampling and shift window of 10ms in the preprocessor, each token corresponds to 4*10=40ms.

If striding approach is used for downsampling, all the convolutions in downsampling would be fully causal and don't see future tokens.
It is recommended to use stacking for streaming model which is significantly faster and uses less memory.
You may use stacking for downsampling in the streaming models which is significantly faster and uses less memory.
It also does not some of the the limitations with striding and vggnet and you may use any downsampling rate.

You may find the example config files of cache-aware streaming Conformer models at
``<NeMo_git_root>/examples/asr/conf/conformer/streaming/conformer_transducer_bpe_streaming.yaml`` for Transducer variant and
at ``<NeMo_git_root>/examples/asr/conf/conformer/streaming/conformer_ctc_bpe.yaml`` for CTC variant.

To simulate cache-aware stremaing, you may use the script at ``<NeMo_git_root>/examples/asr/asr_streaming/speech_to_text_streaming_infer.py``. It can simulate streaming in single stream or multi-stream mode (in batches) for an ASR model.
This script can be used for models trained offline with full-context but the accuracy would not be great unless the chunk size is large enough which would result in high latency.
It is recommended to train a model in streaming model with limited context for this script. More info can be found in the script.

.. _LSTM-Transducer_model:

Expand Down
10 changes: 10 additions & 0 deletions docs/source/asr/scores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,16 @@ FR

--------------------

HR
^^

.. csv-table::
:header-rows: 1
:align: left
:file: data/scores/hr/conformer_hr.csv

--------------------

IT
^^

Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@
'nlp/text_normalization/tn_itn_all.bib',
'tools/tools_all.bib',
'tts_all.bib',
'text_processing/text_processing_all.bib',
'core/adapters/adapter_bib.bib',
]

Expand Down
9 changes: 9 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ NVIDIA NeMo User Guide
nlp/machine_translation/machine_translation
nlp/text_normalization/intro
nlp/api
nlp/models


.. toctree::
Expand All @@ -60,6 +61,14 @@ NVIDIA NeMo User Guide
:caption: Common
:name: Common

text_processing/intro

.. toctree::
:maxdepth: 2
:caption: Text Processing
:name: Text Processing

text_processing/g2p/g2p
common/intro


Expand Down
Loading

0 comments on commit d97bed9

Please sign in to comment.