Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Developer Documents for mcore RETRO (NVIDIA#9026)
* update branch Signed-off-by: eharper <eharper@nvidia.com> * Add dist ckpt support for regular optimizers (NVIDIA#7749) * Add dist ckpt support for regular optimizers Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix imports Signed-off-by: dimapihtar <dpihtar@gmail.com> * imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303) Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Cache Aware Streaming tutorial notebook (NVIDIA#8296) * add notebook Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename old notebook to Buffered_Streaming Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * call setup_streaming_params in set_default_att_context_size method Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links to tutorials in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove hard-coding Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename var Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * fix path location and branch (NVIDIA#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * add deallocate pipeline output optimization (NVIDIA#8279) * add deallocate pipeline output optimization Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299) * save cp_size to self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * use parallel_state instead of self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * remove assertion (NVIDIA#8302) Signed-off-by: dimapihtar <dpihtar@gmail.com> * Update PEFT Doc (NVIDIA#8262) * update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * Merge branch 'r1.23.0' into chcui/update_peft_doc Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks (NVIDIA#8242) (NVIDIA#8324) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <heh@nvidia.com> * update pc strip Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <petezor@gmail.com> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Improve type annotations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix transcribe config Signed-off-by: stevehuang52 <heh@nvidia.com> * Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Stabilize inference Signed-off-by: smajumdar <titu1994@gmail.com> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Remove redundant imports Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <heh@nvidia.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Add typing to beam decoding Signed-off-by: smajumdar <titu1994@gmail.com> * Make prompt format configurable Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <heh@nvidia.com> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * training bug fix and adding launch script for speech_multitask (NVIDIA#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * revert to current transcribe API Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <heh@nvidia.com> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Attempt at refactor... Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update config for AED models (NVIDIA#8294) Signed-off-by: smajumdar <titu1994@gmail.com> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Document compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * add docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> (cherry picked from commit d10726d) Co-authored-by: Piotr Żelasko <petezor@gmail.com> * add code for calling mcore_retro in NeMo * add code for calling mcore_retro in NeMo * runnable, training curve match retro mcore and nemo * working on retro inference * working on megatron_retro_eval.py and megatron_retro_inference.yaml * refactoring text_generation_utils code and retro inference relevant files * clean PR * resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers) * clean repository * revert changes to inference/eval code to original in main * clean code * runable training code, with already implemented eval code * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * revert to original eval code files * revert to original eval code files 2 * revert to original eval code files 3 * revert to original eval code files 4 * clean code * clean code * update my code to support changes from lastest main * commit before rebase r1.23.0 * Multimodal r1.23.0 bug fix (NVIDIA#8315) * Rename quick-gelu Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * ddpm config guard Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix ddpm edit api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix insert_image_token cfg issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * neva updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add back jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update default neva template Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * copy paste files from r1.23.0 * clean PR * Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334) Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Remove asr webapp (NVIDIA#8347) Signed-off-by: smajumdar <titu1994@gmail.com> * remove _target_ at model level in aed config (NVIDIA#8351) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * revert changes for tts and asr * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357) * Add change_vocabulary and save_tokenizers() support Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py Co-authored-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> * Change default (NVIDIA#8371) Signed-off-by: smajumdar <titu1994@gmail.com> * implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support * adding megatron compile_helpers(), in future can be fixed with correct MLM commit * bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Enable megatron core loggers for GPT pretraining (NVIDIA#8354) * Logging changes tested for gpt_pretraining Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * Additional args Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * mcore ds fix (NVIDIA#8283) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert apex installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * turn off the fusion for jenkins Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * addressing Eric's reviews * adding existing implementation RETRO files * adding existing implementation RETRO files * Add Finetuning tutorial with HF Datasets (NVIDIA#8356) * Add Finetuning tutorial with HF Datasets Signed-off-by: Nithin Rao Koluguri <nithinraok> * update on Som comments Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * release updates (NVIDIA#8378) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * add mock ds test Signed-off-by: dimapihtar <dpihtar@gmail.com> * add test for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * mcore ds fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * data input fix Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * MCore dataset compatibility for tokenizers (NVIDIA#8390) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer Signed-off-by: Valerie Sarge <vsarge@nvidia.com> * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. Signed-off-by: Valerie Sarge <vsarge@nvidia.com> --------- Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * Mcore customization doc (NVIDIA#8298) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * initial placeholder Signed-off-by: Huiying Li <huiyingl@nvidia.com> * add to intro/index.rst Signed-off-by: Huiying Li <huiyingl@nvidia.com> * initial content update Signed-off-by: Huiying Li <willwin.lee@gmail.com> * add diff images Signed-off-by: Huiying Li <willwin.lee@gmail.com> size Signed-off-by: Huiying Li <willwin.lee@gmail.com> * minor fixes * minor language change Signed-off-by: Chen Cui <chcui@nvidia.com> * clean changes --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> * wer fix (NVIDIA#8404) Signed-off-by: Travis Bartley <tbartley@nvidia.com> * updated link to pubmed (NVIDIA#8402) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Update NFA video download link (NVIDIA#8406) * update nfa nasa video link Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update link in markdown Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * revert changes (NVIDIA#8410) Signed-off-by: Chen Cui <chcui@nvidia.com> * Fix dreambooth data sampler issue (NVIDIA#8400) * Turn on drop last Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Some neva fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fixed errors in the CTM gen functions (NVIDIA#8416) Signed-off-by: Taejin Park <tango4j@gmail.com> * add ensemble decoding fix (NVIDIA#8427) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * SDE bugfix log (NVIDIA#8430) Signed-off-by: George <gzelenfroind@nvidia.com> * mcore customization doc minor fix (NVIDIA#8421) Signed-off-by: Huiying Li <willwin.lee@gmail.com> * NeMo-Mistral to HF converter bugfix. (NVIDIA#8353) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fixing mcore bert for TP, PP and SP (NVIDIA#8336) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> --------- Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481) * Add settings to suppress bf16 compile errors in CI on V100 Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * MoE parameter passing (NVIDIA#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * PR fixes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * CI fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update k2 version (NVIDIA#8478) (NVIDIA#8492) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Add fp8 support for SD/Update notebook paths (NVIDIA#8489) * Add fp8 support for SD/Update notebook paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * pin to 0.5.0 (NVIDIA#8465) Signed-off-by: eharper <eharper@nvidia.com> * Update NeMo Multimodal Requirements (NVIDIA#8515) * Update requirements_multimodal.txt Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update github raw content link (NVIDIA#8517) Signed-off-by: Chen Cui <chcui@nvidia.com> * Add dep notice for notebooks (NVIDIA#8522) * add dep notice Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> * Revert FP8 integration (NVIDIA#8520) * Revert FP8 integration Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update data prep notebook (NVIDIA#8532) Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * before update branch with latest r1.23.0 * update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint) * remove compile_helpers * reverse changes from main branch to r1.23.0 * adding *_legacy files * update MLM commit in Jenkinsfile to latest * debugging Jenkinstest: test different mcore import in retro_dataset * update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py * removing all mcore RETRO to pass the Jenkinstest * fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py * update Jenkinsfile file to use TE v0.7 * update NeMo to work with latest mcore RETRO (solving TE problems) * update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile * update commit for MLM * jenkinstest debugging * temporary fix RETRO's __init__ for jenkinstest * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * add model.data.dataloader_type=cyclic to jenkinsfile * update code to work with latest megatron-lm main 81dab6067 * update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067 * fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files) * isort and black * adjusting model.micro_batch_size to 1 * fix BRANCH = 'r1.23.0' * replace tutorials dir from main branch to huvu/mcore_retro * fix minor merges conflict * update Jenkinsfile * runnable with a temporary fix from Jacek (unfound -unfinished problem) * runnable with a temporary fix from Jacek (unfound -unfinished problem) * modified nlp_overrides.py back to original * fix checkpoint from Jacek Bieniusiewicz * config Jenkinsfile test * set RETRO Jenkins MBS to 1 * black fix * isort fix * update TE commit * update to latest Jenkinsfile with latest container and commits * remove new RETRO jenkinstest * merge latest main * put RETRO Jenkinstest to the right place * update code for megatron_retro_pretraining_legacy.py * untrack ipa_cmudict-0.7b_nv23.01.txt * untrack ipa_cmudict-0.7b_nv23.01.txt * set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy * update new RETRO jenkinstest to run faster * merging latest main, and edit Jenkinstest * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * huvu/mcore_retro_docs first commit * update with main * update RETRO docs * fix scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt * update docs * update docs * udpate RETRO docs * update with Jennifer's comments --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Travis Bartley <tbartley@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: George <gzelenfroind@nvidia.com> Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: eharper <eharper@nvidia.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
- Loading branch information