Merge r1.11.0 main (NVIDIA#4787)

* NeMo Megatron doc updates Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * update package info and dockerfile Signed-off-by: ericharper <complex451@gmail.com> * fix fastpitch export (NVIDIA#4676) Signed-off-by: Jason <jasoli@nvidia.com> * [TTS] fixed wrong pronunciations for r1.11. (NVIDIA#4677) * [TTS] fixed wrong pronunciations. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * incremented the version number to 22.08 as @blisc suggested. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * correct cmudict versions in world-wide places. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Fix for incorrect batch size issue while decoding (NVIDIA#4675) Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * [TTS] incremented the version number to 22.08 in tutorials. (NVIDIA#4684) * [TTS] incremented the version number to 22.08 in tutorials. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Megatron encode function with RPE fix (NVIDIA#4692) * Fix for RPE Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * fix to fetch config file (NVIDIA#4699) Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * Fix notebook for buffered inference (NVIDIA#4703) Signed-off-by: smajumdar <smajumdar@nvidia.com> * Prompt Learning Notebook Bug Fix (NVIDIA#4689) * Added back dataset class list of dict input for generation in tutorial notebook Signed-off-by: Virginia Adams <vadams@nvidia.com> * updated argument name for build dataset Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> * add psutils to mock imports (NVIDIA#4728) Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> * Update Aligner model and tutorial to add NGC checkpoint loading (NVIDIA#4714) * Update Aligner model and tutorial to add NGC checkpoint loading Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Fix pynini install for Aligner notebook, minor formatting fix for model Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Aligner notebook formatting consistency Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [TTS] bugfix for missing configs. (NVIDIA#4725) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * docs typo fix Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> * Fix pynini install in TTS tutorials (NVIDIA#4729) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Fix ASR notebooks (NVIDIA#4738) Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> * Multilingual VAD model (NVIDIA#4734) * add ngc link Signed-off-by: fayejf <fayejf07@gmail.com> * add tuned VAD config on ASR data Signed-off-by: fayejf <fayejf07@gmail.com> * yaml note Signed-off-by: fayejf <fayejf07@gmail.com> * update vad asr notebook with mVAD Signed-off-by: fayejf <fayejf07@gmail.com> * update vad infer config comment Signed-off-by: fayejf <fayejf07@gmail.com> * fix Signed-off-by: fayejf <fayejf07@gmail.com> * mvad sd config for ch109 Signed-off-by: fayejf <fayejf07@gmail.com> * update sd readme Signed-off-by: fayejf <fayejf07@gmail.com> * add new mVAD model to doc Signed-off-by: fayejf <fayejf07@gmail.com> * style fix Signed-off-by: fayejf <fayejf07@gmail.com> * update sd tutorial with mVAD Signed-off-by: fayejf <fayejf07@gmail.com> * typo fix Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <fayejf07@gmail.com> * publish pretrained itn t5 model for English (NVIDIA#4748) Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> * Updated docs and doc paths (NVIDIA#4754) * Updated docs and doc paths Signed-off-by: Virginia Adams <vadams@nvidia.com> * Update Multitask_Prompt_and_PTuning.ipynb * Update README.rst * Changed branch name to use single quotes Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> * fix bug relating to ddp strategy in joint intent slot classification tutorial (NVIDIA#4762) * [TTS] updated config with a German IPA phoneme tokenizer (NVIDIA#4756) * [TTS] added a German IPA phoneme tokenizer * [TTS][ASR] enabled customized arguments for trimming the leading and trailing silence. * [TTS] disabled spline interpolation for beta-binomial distribution. Let it generate align prior and save to disks. Use a new phoneme tokenizer. * [TTS] use consistent spline interpolation with fastpitch checkpoint when generating mel-spectrograms for hifigan finetune. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update r1.11 to new heteronyms list (NVIDIA#4745) * Update configs to new heteronyms list * Remove old heteronyms list, add alt 'merchandise' pron to CMUdict * Update remaining references to old heteronyms list Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [TTS] Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints (NVIDIA#4763) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [TTS] Add single male speaker German FastPitch and HiFiGAN NGC checkpoints (NVIDIA#4770) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update CMUdict with more recent 0.7b entries (NVIDIA#4768) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Install pynini in docker container (NVIDIA#4733) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Fix tutorial formatting (NVIDIA#4778) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * [TTS] deprecated old scripts for ljspeech. (NVIDIA#4780) * deprecated old scripts for ljspeech. * removed relevent function calls in TTS docs. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * update package info and requirements Signed-off-by: ericharper <complex451@gmail.com> * update container Signed-off-by: ericharper <complex451@gmail.com> * Update stragglers to new cmudict and heteronyms paths Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Rajesh Ilango <rilango@gmail.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: bene-ges <61418381+bene-ges@users.noreply.github.com> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv · Nov 29, 2022 · d167953 · d167953
1 parent 881240c
commit d167953
Show file tree

Hide file tree

Showing 65 changed files with 1,077 additions and 1,468 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -34,9 +34,9 @@ RUN apt-get update && \
 
 # FIXME a workaround to update apex. Remove when base image is updated
 WORKDIR /tmp/
-RUN git clone https://github.com/ericharper/apex.git && \
+RUN git clone https://github.com/NVIDIA/apex.git && \
     cd apex && \
-    git checkout 19e4f55eb402452f74dead19f68b65d6291cfdb2 && \
+    git checkout 3c19f1061879394f28272a99a7ea26d58f72dace && \
     pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" ./
 
 # uninstall stuff from base container
@@ -60,6 +60,10 @@ WORKDIR /tmp/nemo
 COPY requirements .
 RUN for f in $(ls requirements*.txt); do pip install --disable-pip-version-check --no-cache-dir -r $f; done
 
+# install pynini
+COPY nemo_text_processing/install_pynini.sh /tmp/nemo/
+RUN /bin/bash /tmp/nemo/install_pynini.sh
+
 # install k2, skip if installation fails
 COPY scripts /tmp/nemo/scripts/
 RUN /bin/bash /tmp/nemo/scripts/speech_recognition/k2/setup.sh || exit 0
@@ -70,7 +74,7 @@ COPY . .
 
 # start building the final container
 FROM nemo-deps as nemo
-ARG NEMO_VERSION=1.11.0
+ARG NEMO_VERSION=1.12.0
 
 # Check that NEMO_VERSION is set. Build will fail without this. Expose NEMO and base container
 # version information as runtime environment variable for introspection purposes

diff --git a/Jenkinsfile b/Jenkinsfile
@@ -2,7 +2,7 @@ pipeline {
   agent {
         docker {
       //image 'nvcr.io/nvidia/pytorch:22.05-py3'
-      image 'gitlab-master.nvidia.com:5005/eharper/nemo_containers:megatron_gpt_v16'
+      image 'gitlab-master.nvidia.com:5005/eharper/nemo_containers:nemo_ci_pytorch_22.07_apex_3c19f1061879394f28272a99a7ea26d58f72dace'
       args '--device=/dev/nvidia0 --gpus all -e TRANSFORMERS_OFFLINE=1 --user 0:128 -v /home/TestData:/home/TestData -v $HOME/.cache:/root/.cache --shm-size=8g'
         }
   }

diff --git a/README.rst b/README.rst
@@ -68,7 +68,7 @@ Key Features
     * `Information retrieval <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/information_retrieval.html>`_
     * `Entity Linking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/entity_linking.html>`_
     * `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/sgd_qa.html>`_   
-    * `Prompt Learning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/prompt_learning.html>`_
+    * `Prompt Learning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/prompt_learning.html>`_
     * `NGC collection of pre-trained NLP models. <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_nlp>`_
 * `Speech synthesis (TTS) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tts/intro.html#>`_
     * Spectrogram generation: Tacotron2, GlowTTS, TalkNet, FastPitch, FastSpeech2, Mixer-TTS, Mixer-TTS-X
@@ -205,6 +205,12 @@ Megatron GPT training requires NVIDIA Apex to be installed.
     git checkout 3c19f1061879394f28272a99a7ea26d58f72dace
     pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" ./
 
+.. note::
+
+  You may need to modify [setup.py](https://github.com/NVIDIA/apex/blob/3c19f1061879394f28272a99a7ea26d58f72dace/setup.py) if 
+  your version of CUDA does not match the version used to compile Pytorch binaries, comment lines 33-41 in the above link
+  before installing.
+
 Docker containers:
 ~~~~~~~~~~~~~~~~~~
 To build a nemo container with Dockerfile from a branch, please run 
@@ -214,13 +220,13 @@ To build a nemo container with Dockerfile from a branch, please run
     DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest .
 
 
-If you chose to work with main branch, we recommend using NVIDIA's PyTorch container version 22.05-py3 and then installing from GitHub.
+If you chose to work with main branch, we recommend using NVIDIA's PyTorch container version 22.07-py3 and then installing from GitHub.
 
 .. code-block:: bash
 
     docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \
     -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
-    stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:22.05-py3
+    stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:22.07-py3
 
 Examples
 --------

diff --git a/docs/source/asr/speaker_diarization/data/diarization_results.csv b/docs/source/asr/speaker_diarization/data/diarization_results.csv
@@ -1,4 +1,5 @@
 Model Name,Model Base Class,Model Card
+vad_multilingual_marblenet,EncDecClassificationModel,"https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/vad_multilingual_marblenet"
 vad_marblenet,EncDecClassificationModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_marblenet"
 vad_telephony_marblenet,EncDecClassificationModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_telephony_marblenet"
 titanet_large,EncDecSpeakerLabelModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:titanet_large"

diff --git a/docs/source/asr/speaker_diarization/results.rst b/docs/source/asr/speaker_diarization/results.rst
@@ -11,7 +11,7 @@ Load VAD model
 
 .. code-block:: bash
 
-  pretrained_vad_model='/path/to/vad_marblenet.nemo' # local .nemo or pretrained vad model name
+  pretrained_vad_model='/path/to/vad_multilingual_marblenet.nemo' # local .nemo or pretrained vad model name
   ...
   # pass with hydra config
   config.diarizer.vad.model_path=pretrained_vad_model
@@ -58,7 +58,7 @@ In general, you can load models with model name in the following format,
 
 .. code-block:: python
 
-  pretrained_vad_model='vad_telephony_marblenet' 
+  pretrained_vad_model='vad_multilingual_marblenet' 
   pretrained_speaker_model='titanet_large' 
   ...
   config.diarizer.vad.model_path=retrained_vad_model \

diff --git a/docs/source/asr/speech_classification/data/classification_results.csv b/docs/source/asr/speech_classification/data/classification_results.csv
@@ -1,4 +1,5 @@
 Model Name,Model Base Class,Model Card
+vad_multilingual_marblenet,EncDecClassificationModel,"https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/vad_multilingual_marblenet"
 vad_marblenet,EncDecClassificationModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_marblenet"
 vad_telephony_marblenet,EncDecClassificationModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_telephony_marblenet"
 commandrecognition_en_matchboxnet3x1x64_v1,EncDecClassificationModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:commandrecognition_en_matchboxnet3x1x64_v1"

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -58,6 +58,7 @@
     'joblib',
     'IPython',
     'ipadic',
+    'psutil',
 ]
 
 _skipped_autodoc_mock_imports = ['wrapt', 'numpy']

diff --git a/docs/source/nlp/nemo_megatron/prompt_learning.rst b/docs/source/nlp/nemo_megatron/prompt_learning.rst
@@ -5,7 +5,7 @@ Prompt Learning
 
 Within NeMo we refer to **p-tuning** and **prompt tuning** methods collectively as prompt learning. Both methods are parameter efficient alternatives to fine-tuning pretrained language models. Our NeMo implementation makes it possible to use one pretrained GPT model on many downstream tasks without needing to tune the model's full set of parameters. It also allows for adding new tasks to your model without overwriting or disrupting previous tasks for which the model has already been p-tuned/prompt-tuned. Because the original model parameters are frozen and never altered by either method, p-tuning/prompt-tuning also avoids catastrophic forgetting issues often encountered when fine-tuning models. 
 
-Instead of selecting discrete text prompts in a manual or automated fashion, prompt tuning and p-tuning utilize virtual prompt embeddings that can be optimized via gradient decent. The only difference between prompt tuning and p-tuning within NeMo-Megatron is the architecture used to tune the soft prompt tokens during training.
+Instead of selecting discrete text prompts in a manual or automated fashion, prompt tuning and p-tuning utilize virtual prompt embeddings that can be optimized via gradient descent. The only difference between prompt tuning and p-tuning within NeMo-Megatron is the architecture used to tune the soft prompt tokens during training.
 
 - Our prompt tuning implementation is based off Lester et. al’s EMNLP 2021 paper "`The Power of Scale for Parameter-Efficient Prompt Tuning <https://arxiv.org/abs/2104.08691>`_"
 - Our p-tuning implementation is based off Liu et al's paper "`GPT Understands, Too <https://arxiv.org/abs/2103.10385>`_"
@@ -217,13 +217,12 @@ First define a config called ``multitask-prompt-learning.yaml`` demonstrated bel
   model:
     seed: 1234
     nemo_path: ${name}.nemo 
-    lm_finetune: False 
-    pseudo_token_base: "PROMPT_" 
     virtual_prompt_style: "prompt-tuning" 
     encoder_seq_length: 2048 
     tensor_model_parallel_size: 1 
     pipeline_model_parallel_size: 1 
-    batch_size: 8
+    global_batch_size: 16
+    micro_batch_size: 4
 
     restore_path: null 
     language_model_path: models/megatron_125M_gpt.nemo
@@ -281,58 +280,57 @@ In this example, the SQuAD task includes the question context as part of the pro
   trainer: ...
   exp_manager: ...
   model:
-  seed: 1234
-  nemo_path: ${name}.nemo 
-  lm_finetune: False 
-  pseudo_token_base: "PROMPT_" 
-  virtual_prompt_style: "p-tuning" # ***
-  encoder_seq_length: 2048 
-  tensor_model_parallel_size: 1 
-  pipeline_model_parallel_size: 1 
-  batch_size: 8
-
-  restore_path: multitask_prompt_tuning.nemo # ***
-  language_model_path: models/megatron_125M_gpt.nemo
-  existing_tasks: ["sentiment", "intent_and_slot"] # ***
-  new_tasks: ["squad"] 
-
-  task_templates: 
-  - taskname: "sentiment" 
-    prompt_template: "<|VIRTUAL_PROMPT_0|> {sentence} sentiment: {label}" 
-    total_virtual_tokens: 100 
-    virtual_token_splits: [100] 
-    truncate_field: null
-    answer_only_loss: False
-
-  - taskname: "intent_and_slot"
-    prompt_template: "<|VIRTUAL_PROMPT_0|> Predict intent and slot <|VIRTUAL_PROMPT_1|> :\n{utterance}{label}" 
-    total_virtual_tokens: 100 
-    virtual_token_splits: [80, 20]
-    truncate_field: null
-    answer_only_loss: True
-    answer_field: "label"
-
-  - taskname: "squad" # ***
-    prompt_template: "<|VIRTUAL_PROMPT_0|> Answer the question from the context {question} {context} Answer: {answer}" # *** 
-    total_virtual_tokens: 9 # ***
-    virtual_token_splits: [9] # ***
-    truncate_field: context # ***
-    answer_only_loss: True # ***
-    answer_field: "answer" # ***
-
-  p_tuning: # ***
-      dropout: 0.0 # ***
-      num_layers: 2 # ***
-      
-  data:
-    train_ds: ["data/squad_train.jsonl"] # ***
-    validation_ds: ["data/squad_val.jsonl"] # ***
-    add_eos: True
-    shuffle: True
-    num_workers: 1
-    pin_memory: True
-
-  optim: ...
+    seed: 1234
+    nemo_path: ${name}.nemo 
+    virtual_prompt_style: "p-tuning" # ***
+    encoder_seq_length: 2048 
+    tensor_model_parallel_size: 1 
+    pipeline_model_parallel_size: 1 
+    global_batch_size: 16
+    micro_batch_size: 4
+
+    restore_path: multitask_prompt_tuning.nemo # ***
+    language_model_path: models/megatron_125M_gpt.nemo
+    existing_tasks: ["sentiment", "intent_and_slot"] # ***
+    new_tasks: ["squad"] 
+
+    task_templates: 
+    - taskname: "sentiment" 
+      prompt_template: "<|VIRTUAL_PROMPT_0|> {sentence} sentiment: {label}" 
+      total_virtual_tokens: 100 
+      virtual_token_splits: [100] 
+      truncate_field: null
+      answer_only_loss: False
+
+    - taskname: "intent_and_slot"
+      prompt_template: "<|VIRTUAL_PROMPT_0|> Predict intent and slot <|VIRTUAL_PROMPT_1|> :\n{utterance}{label}" 
+      total_virtual_tokens: 100 
+      virtual_token_splits: [80, 20]
+      truncate_field: null
+      answer_only_loss: True
+      answer_field: "label"
+
+    - taskname: "squad" # ***
+      prompt_template: "<|VIRTUAL_PROMPT_0|> Answer the question from the context {question} {context} Answer: {answer}" # *** 
+      total_virtual_tokens: 9 # ***
+      virtual_token_splits: [9] # ***
+      truncate_field: context # ***
+      answer_only_loss: True # ***
+      answer_field: "answer" # ***
+
+    p_tuning: # ***
+        dropout: 0.0 # ***
+        num_layers: 2 # ***
+        
+    data:
+      train_ds: ["data/squad_train.jsonl"] # ***
+      validation_ds: ["data/squad_val.jsonl"] # ***
+      add_eos: True
+      shuffle: True
+      num_workers: 1
+      pin_memory: True
+
+    optim: ...
 
 Then run the command again:
 
@@ -356,7 +354,7 @@ The inference file can contain a mix of prompts from all the tasks the model has
             trainer.num_nodes=1 \
             tensor_model_parallel_size=1 \
             pipeline_model_parallel_size=1 \
-            data_paths=[path/to/dataset1.jsonl, path/to/dataset2.jsonl]
+            prompts=[prompt1,prompt2]
             
 ``virtual_prompt_model_file`` should be a path to a .nemo file saved after p-tuning/prompt tuning and ``model_file`` is still the path to the gpt model's .nemo file.   
 
@@ -384,7 +382,9 @@ And the dataset class will automatically format your input to have the form:
       '<|VIRTUAL_PROMPT_0|> Context: some paragraph Question: question related to paragraph Answer: ',
       '<|VIRTUAL_PROMPT_0|> Context: another paragraph Question: a different question related to paragraph Answer: '
   ]
+        
+Generally prompt learning inference is just like running inference with a GPT model. The only difference is you need to add ``virtual_prompt_model_file=PATH_TO_NEMO_PROMPT_LEARNING_MODEL_FILE`` to your command if you're using a p-tuned/prompt-tuned model. 
 
 Example prompt learning script: `NeMo/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py.py <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py>`__.
 
-Example prompt tuned inference script: `NeMo/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py>`__.
+Example prompt tuned inference script: `NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_eval.py>`__.