-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDT compute timestamps option and Extra Whitespace handling for SPE #10875
Conversation
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
…NVIDIA/NeMo into msekoyan/tdt_compute_timestamps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks much!
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
…dt_compute_timestamps
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
…dt_compute_timestamps
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
…dt_compute_timestamps
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
assert ans["input_ids"].tolist() == [4, 8, 7, 8, 5, 11, 91, 30, 40, 3] | ||
assert ans["context_ids"].tolist() == [4, 8, 7, 8, 5] | ||
assert ans["answer_ids"].tolist() == [11, 91, 30, 40, 3] | ||
assert canary_tokenizer.ids_to_text(ans["input_ids"].tolist()) == '<|startoftranscript|><|en|><|transcribe|><|en|><|pnc|> TEST<|endoftext|>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, these prompt formatting tests are much better now - readable and future-proof.
@@ -36,13 +37,19 @@ class SentencePieceTokenizer(TokenizerSpec, ChatTemplateMixin): | |||
special_tokens: either list of special tokens or dictionary of token name to token value | |||
legacy: when set to True, the previous behavior of the SentecePiece wrapper will be restored, | |||
including the possibility to add special tokens inside wrapper. | |||
ignore_extra_whitespaces: whether to ignore extra whitespaces in the input text while encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand - why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for already trained tokenizers (which are trained to ignore extra whitespaces). by adding this parameter we can make inference to these models not to ignore them.
So, for example, if we want to encode ASR model's prediction text with a lot of spaces
, we will get this tokens (and their corresponding ids) ["_text", "_with", "_", "a", "_lot", "_of", "_spaces"]
. but instead we should have got ["_text", "_with", "_", "_", "_", "_", "a", "_lot", "_of", "_spaces"]
which was in fact predicted by the model and can be used for alignment purposes (I believe there are other use-cases too)
[🤖]: Hi @monica-sekoyan 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you!
…10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
…10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
…VIDIA#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* add initial code for llama vlm Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some restructure Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add mock data placeholder Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix some importing Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add language component for vlm llama * update code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * now match num of params * update language part and fix vision part Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * model can now init Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor update for llama32 text config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * make checkpoint loading work * missing import * match vision part tensor shapes with configs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * solve some fwd issues and mismatch issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add vision import * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update importer to convert both text and image weights * importer typos and reduce clutter * fix import qkv * some fixes for LLM Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add embedding * some updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * enable loading only text or only vision * add example script * TP fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update * upload examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generate Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to newer version Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * upload for sharing * update to new pyt ckpt * xattn_caches matches (except small differences due to TE RMSNorm) * cleanup * embeddings match * match precision of weights * update sharded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * change xattn layer num to 3 7 11 etc * upload llama generation * minor fix * fix dummy layer input format * fix vision qkv order * fix shareded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix vision precision * fix rope * match cross attn layer * remove nrep * Remove cross attention in ImageTransformerLayer and fix _gate_ffn * PP draft Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix intermediate tensor * temp save for pp2 is working Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix pp issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * merge * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * small update to pretrain script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * added energon dataloader for neva training (#10451) * added energon dataloader for neva training * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * specify global batch size to support grad accumulation * adding neva pretrain example * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * change pretraine example to handle new ckpt reloading * fixed code quality warnings and unused imports Signed-off-by: ykarnati <ykarnati@nvidia.com> * minor changes for PR comments * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * refactor conversation template config * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * remove optional import --------- Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> (cherry picked from commit 7354740) * llama energon dataloader * have tokenizer for base task encoder class * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Add simple inference * evian3 update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add aspect ratio in model * support energon dataloader * some pp update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv merging Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix get_key_value_tensors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename files Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to HF style position embedding Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix energon dataloader and support batching * update forward args Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up and move to aspect_ratio_ids Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename back to language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix loss function Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update and fix energon Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add hf import * Fix type * Change config * update energon pretrain Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up * clean up * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update inference files for new code * update to instruct * update to instruct * update few names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer embedding.weight * few fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add hf script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv import * remove interleaved * fixes and updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * lora fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some code clean ups Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update training scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * refactors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add LoRA finetuning * fixes and nemo update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer registering issue by adding 11B and 90B configs * update `decoder_seq_len` Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * science vqa script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up script name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix ckpt save serialization issue * fix predefined config classes * add num_chunks in input Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix format Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update finetuning scripts for PEFT * add 11b recipe (need #10645 to test) * fix mask generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix code style Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Support no image inference * add llama svqa eval * fix masking Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add 90b recipe and revise 11b recipe * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * clean up typing * add option to disable vision padding * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * base model finetuning (does not work yet) * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fixed default conversation template config for MLLama * Update svqa * add multinode * bot happy * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Perf improvements. Mainly from XAttn mask calculation (#10901) * Perf improvements. Mainly from XAttn mask calculation * Apply isort and black reformatting Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> --------- Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> * fix existing issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix lora * few fixes for non image support Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update masking gen Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix data sampler and loading issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add vlm generation * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * generation update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * hide vlm examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "Add vlm generation" This reverts commit 4711c75 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix VisionEncoder multi-batch bug * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * llm.generate fixes (#10983) * fix context path, disable optimizer init, add tp Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * format Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * address comments, require user to provide trainer Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fix Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fixes Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * use __dict__ in check (#11012) * check is_hf_model in leaf module Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * disable getattr alternative path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo; Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * LoRA support for HF::AutoModelForCausalLM (#10982) * add LinearAdapter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add hf lora example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * subclass mixin Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove stale imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix scale Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * regex selector for peft Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move lora Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * hf_auto_model_for_causal_lm finetune recipe Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Change default for always_save_context to True (#11014) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> * Add a build option to load_context (#10713) * Add a build option to load_context Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Adding test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Trying to fix failing CPU test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * cherry-pick fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fix pip install (#11026) * Move AutoTokenizer inline Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move einops to common requirements Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move AutoTokenizer import to top-level again in fine_tuning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move megatron init inside nemo.lightning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Make megatron_lazy_init_context work when transformer-engine is not installed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Only import get_nmt_tokenizer when needed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> * [WIP] Add docs for NEST SSL (#10804) * add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc and fix missing param Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> * Change dist ckpt defaults (#10913) * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * fix ssm tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Make note that ckpt_async_save is disabled for SSMs Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for SSMs with fix Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Disable async ckpt in the peft test as it is a known bug, add note. Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix failing unit tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Ashors/peft async ckpt (#11010) * [WIP] prototype for supporting async checkpointing with peft Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for the peft test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix peft setup test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> --------- Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> * Akoumparouli/mixtral recipe fix r2.0.0 (#10994) * Mixtral TP8 EP1 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Fix _strategy_lib tests (#11033) * fix world size and don't mock Signed-off-by: Maanu Grover <maanug@nvidia.com> * cleanup global state Signed-off-by: Maanu Grover <maanug@nvidia.com> * check app state instead Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix syntax nemo logger test Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> * Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016) * Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)" This reverts commit b5798de. * make megatron sampler return the total number of batches in the dataset Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> * PTQ example for NeMo 2.0 (#10642) * initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * create Quantizer for NeMo 2.0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Call quantize on an unwrapped mcore model Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Add tests, adjust unwrapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix export Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix output_path argument for HF import Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> * fix fabric ckpt loading Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * code review suggestions Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * remove unused import Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * use cnn dataset in github ci Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * applied code review Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * simplify interface for data iterator Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * (partial) PP fix Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * TDT compute timestamps option and Extra Whitespace handling for SPE (#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * Basic online dynamic FP8 quantization with vLLM (#10904) * Basic online dynamic quantization with vLLM Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> * vllm 0.6.3 updates Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass quantization param in deploy_vllm_triton.py script Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: janekl <janekl@users.noreply.github.com> * ci: Improve VM maintenance (#10758) * ci: Improve VM maintenance Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * rename stuff Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * title Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * use team Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * run on failure too Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * yrdy Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * test Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Add comment for vision transpose * update megatron_init.py inside lightning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename llama to mllama folder name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update dropout to 0 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * remove disable_vision_padding since we now have a fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update init for mllama Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix copyright title Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix code scan Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update vision code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * revert attention bias changes until latest MLM code got merged Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Turn off system message check, as it's "" now Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Rolllback megatron_parallel.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: janekl <janekl@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
* add initial code for llama vlm Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some restructure Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add mock data placeholder Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix some importing Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add language component for vlm llama * update code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * now match num of params * update language part and fix vision part Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * model can now init Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor update for llama32 text config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * make checkpoint loading work * missing import * match vision part tensor shapes with configs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * solve some fwd issues and mismatch issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add vision import * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update importer to convert both text and image weights * importer typos and reduce clutter * fix import qkv * some fixes for LLM Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add embedding * some updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * enable loading only text or only vision * add example script * TP fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update * upload examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generate Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to newer version Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * upload for sharing * update to new pyt ckpt * xattn_caches matches (except small differences due to TE RMSNorm) * cleanup * embeddings match * match precision of weights * update sharded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * change xattn layer num to 3 7 11 etc * upload llama generation * minor fix * fix dummy layer input format * fix vision qkv order * fix shareded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix vision precision * fix rope * match cross attn layer * remove nrep * Remove cross attention in ImageTransformerLayer and fix _gate_ffn * PP draft Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix intermediate tensor * temp save for pp2 is working Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix pp issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * merge * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * small update to pretrain script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * added energon dataloader for neva training (NVIDIA#10451) * added energon dataloader for neva training * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * specify global batch size to support grad accumulation * adding neva pretrain example * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * change pretraine example to handle new ckpt reloading * fixed code quality warnings and unused imports Signed-off-by: ykarnati <ykarnati@nvidia.com> * minor changes for PR comments * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * refactor conversation template config * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * remove optional import --------- Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> (cherry picked from commit 7354740) * llama energon dataloader * have tokenizer for base task encoder class * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Add simple inference * evian3 update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add aspect ratio in model * support energon dataloader * some pp update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv merging Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix get_key_value_tensors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename files Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to HF style position embedding Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix energon dataloader and support batching * update forward args Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up and move to aspect_ratio_ids Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename back to language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix loss function Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update and fix energon Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add hf import * Fix type * Change config * update energon pretrain Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up * clean up * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update inference files for new code * update to instruct * update to instruct * update few names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer embedding.weight * few fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add hf script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv import * remove interleaved * fixes and updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * lora fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some code clean ups Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update training scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * refactors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add LoRA finetuning * fixes and nemo update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer registering issue by adding 11B and 90B configs * update `decoder_seq_len` Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * science vqa script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up script name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix ckpt save serialization issue * fix predefined config classes * add num_chunks in input Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix format Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update finetuning scripts for PEFT * add 11b recipe (need NVIDIA#10645 to test) * fix mask generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix code style Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Support no image inference * add llama svqa eval * fix masking Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add 90b recipe and revise 11b recipe * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * clean up typing * add option to disable vision padding * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * base model finetuning (does not work yet) * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fixed default conversation template config for MLLama * Update svqa * add multinode * bot happy * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Perf improvements. Mainly from XAttn mask calculation (NVIDIA#10901) * Perf improvements. Mainly from XAttn mask calculation * Apply isort and black reformatting Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> --------- Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> * fix existing issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix lora * few fixes for non image support Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update masking gen Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix data sampler and loading issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add vlm generation * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * generation update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * hide vlm examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "Add vlm generation" This reverts commit 4711c75 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix VisionEncoder multi-batch bug * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * llm.generate fixes (NVIDIA#10983) * fix context path, disable optimizer init, add tp Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * format Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * address comments, require user to provide trainer Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fix Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fixes Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * use __dict__ in check (NVIDIA#11012) * check is_hf_model in leaf module Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * disable getattr alternative path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo; Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * LoRA support for HF::AutoModelForCausalLM (NVIDIA#10982) * add LinearAdapter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add hf lora example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * subclass mixin Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove stale imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix scale Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * regex selector for peft Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move lora Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * hf_auto_model_for_causal_lm finetune recipe Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Change default for always_save_context to True (NVIDIA#11014) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> * Add a build option to load_context (NVIDIA#10713) * Add a build option to load_context Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Adding test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Trying to fix failing CPU test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * cherry-pick fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fix pip install (NVIDIA#11026) * Move AutoTokenizer inline Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move einops to common requirements Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move AutoTokenizer import to top-level again in fine_tuning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move megatron init inside nemo.lightning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Make megatron_lazy_init_context work when transformer-engine is not installed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Only import get_nmt_tokenizer when needed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> * [WIP] Add docs for NEST SSL (NVIDIA#10804) * add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc and fix missing param Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> * Change dist ckpt defaults (NVIDIA#10913) * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * fix ssm tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Make note that ckpt_async_save is disabled for SSMs Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for SSMs with fix Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Disable async ckpt in the peft test as it is a known bug, add note. Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix failing unit tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Ashors/peft async ckpt (NVIDIA#11010) * [WIP] prototype for supporting async checkpointing with peft Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for the peft test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix peft setup test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> --------- Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> * Akoumparouli/mixtral recipe fix r2.0.0 (NVIDIA#10994) * Mixtral TP8 EP1 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Fix _strategy_lib tests (NVIDIA#11033) * fix world size and don't mock Signed-off-by: Maanu Grover <maanug@nvidia.com> * cleanup global state Signed-off-by: Maanu Grover <maanug@nvidia.com> * check app state instead Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix syntax nemo logger test Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> * Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (NVIDIA#11016) * Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (NVIDIA#10383)" This reverts commit b5798de. * make megatron sampler return the total number of batches in the dataset Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> * PTQ example for NeMo 2.0 (NVIDIA#10642) * initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * create Quantizer for NeMo 2.0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Call quantize on an unwrapped mcore model Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Add tests, adjust unwrapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix export Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix output_path argument for HF import Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> * fix fabric ckpt loading Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * code review suggestions Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * remove unused import Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * use cnn dataset in github ci Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * applied code review Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * simplify interface for data iterator Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * (partial) PP fix Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * TDT compute timestamps option and Extra Whitespace handling for SPE (NVIDIA#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * Basic online dynamic FP8 quantization with vLLM (NVIDIA#10904) * Basic online dynamic quantization with vLLM Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> * vllm 0.6.3 updates Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass quantization param in deploy_vllm_triton.py script Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: janekl <janekl@users.noreply.github.com> * ci: Improve VM maintenance (NVIDIA#10758) * ci: Improve VM maintenance Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * rename stuff Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * title Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * use team Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * run on failure too Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * yrdy Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * test Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Add comment for vision transpose * update megatron_init.py inside lightning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename llama to mllama folder name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update dropout to 0 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * remove disable_vision_padding since we now have a fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update init for mllama Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix copyright title Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix code scan Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update vision code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * revert attention bias changes until latest MLM code got merged Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Turn off system message check, as it's "" now Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Rolllback megatron_parallel.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: janekl <janekl@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
…VIDIA#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
* add initial code for llama vlm Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some restructure Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add mock data placeholder Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix some importing Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add language component for vlm llama * update code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * now match num of params * update language part and fix vision part Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * model can now init Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor update for llama32 text config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * make checkpoint loading work * missing import * match vision part tensor shapes with configs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * solve some fwd issues and mismatch issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add vision import * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update importer to convert both text and image weights * importer typos and reduce clutter * fix import qkv * some fixes for LLM Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add embedding * some updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * enable loading only text or only vision * add example script * TP fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update * upload examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generate Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to newer version Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * upload for sharing * update to new pyt ckpt * xattn_caches matches (except small differences due to TE RMSNorm) * cleanup * embeddings match * match precision of weights * update sharded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * change xattn layer num to 3 7 11 etc * upload llama generation * minor fix * fix dummy layer input format * fix vision qkv order * fix shareded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix vision precision * fix rope * match cross attn layer * remove nrep * Remove cross attention in ImageTransformerLayer and fix _gate_ffn * PP draft Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix intermediate tensor * temp save for pp2 is working Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix pp issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * merge * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * small update to pretrain script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * added energon dataloader for neva training (NVIDIA#10451) * added energon dataloader for neva training * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * specify global batch size to support grad accumulation * adding neva pretrain example * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * change pretraine example to handle new ckpt reloading * fixed code quality warnings and unused imports Signed-off-by: ykarnati <ykarnati@nvidia.com> * minor changes for PR comments * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * refactor conversation template config * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * remove optional import --------- Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> (cherry picked from commit 7354740) * llama energon dataloader * have tokenizer for base task encoder class * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Add simple inference * evian3 update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add aspect ratio in model * support energon dataloader * some pp update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv merging Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix get_key_value_tensors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename files Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to HF style position embedding Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix energon dataloader and support batching * update forward args Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up and move to aspect_ratio_ids Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename back to language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix loss function Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update and fix energon Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add hf import * Fix type * Change config * update energon pretrain Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up * clean up * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update inference files for new code * update to instruct * update to instruct * update few names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer embedding.weight * few fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add hf script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv import * remove interleaved * fixes and updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * lora fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some code clean ups Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update training scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * refactors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add LoRA finetuning * fixes and nemo update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer registering issue by adding 11B and 90B configs * update `decoder_seq_len` Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * science vqa script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up script name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix ckpt save serialization issue * fix predefined config classes * add num_chunks in input Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix format Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update finetuning scripts for PEFT * add 11b recipe (need NVIDIA#10645 to test) * fix mask generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix code style Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Support no image inference * add llama svqa eval * fix masking Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add 90b recipe and revise 11b recipe * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * clean up typing * add option to disable vision padding * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * base model finetuning (does not work yet) * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fixed default conversation template config for MLLama * Update svqa * add multinode * bot happy * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Perf improvements. Mainly from XAttn mask calculation (NVIDIA#10901) * Perf improvements. Mainly from XAttn mask calculation * Apply isort and black reformatting Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> --------- Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> * fix existing issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix lora * few fixes for non image support Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update masking gen Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix data sampler and loading issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add vlm generation * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * generation update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * hide vlm examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "Add vlm generation" This reverts commit 4711c75 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix VisionEncoder multi-batch bug * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * llm.generate fixes (NVIDIA#10983) * fix context path, disable optimizer init, add tp Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * format Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * address comments, require user to provide trainer Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fix Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fixes Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * use __dict__ in check (NVIDIA#11012) * check is_hf_model in leaf module Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * disable getattr alternative path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo; Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * LoRA support for HF::AutoModelForCausalLM (NVIDIA#10982) * add LinearAdapter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add hf lora example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * subclass mixin Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove stale imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix scale Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * regex selector for peft Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move lora Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * hf_auto_model_for_causal_lm finetune recipe Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Change default for always_save_context to True (NVIDIA#11014) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> * Add a build option to load_context (NVIDIA#10713) * Add a build option to load_context Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Adding test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Trying to fix failing CPU test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * cherry-pick fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fix pip install (NVIDIA#11026) * Move AutoTokenizer inline Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move einops to common requirements Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move AutoTokenizer import to top-level again in fine_tuning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move megatron init inside nemo.lightning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Make megatron_lazy_init_context work when transformer-engine is not installed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Only import get_nmt_tokenizer when needed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> * [WIP] Add docs for NEST SSL (NVIDIA#10804) * add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc and fix missing param Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> * Change dist ckpt defaults (NVIDIA#10913) * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * fix ssm tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Make note that ckpt_async_save is disabled for SSMs Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for SSMs with fix Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Disable async ckpt in the peft test as it is a known bug, add note. Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix failing unit tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Ashors/peft async ckpt (NVIDIA#11010) * [WIP] prototype for supporting async checkpointing with peft Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for the peft test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix peft setup test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> --------- Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> * Akoumparouli/mixtral recipe fix r2.0.0 (NVIDIA#10994) * Mixtral TP8 EP1 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Fix _strategy_lib tests (NVIDIA#11033) * fix world size and don't mock Signed-off-by: Maanu Grover <maanug@nvidia.com> * cleanup global state Signed-off-by: Maanu Grover <maanug@nvidia.com> * check app state instead Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix syntax nemo logger test Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> * Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (NVIDIA#11016) * Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (NVIDIA#10383)" This reverts commit b5798de. * make megatron sampler return the total number of batches in the dataset Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> * PTQ example for NeMo 2.0 (NVIDIA#10642) * initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * create Quantizer for NeMo 2.0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Call quantize on an unwrapped mcore model Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Add tests, adjust unwrapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix export Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix output_path argument for HF import Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> * fix fabric ckpt loading Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * code review suggestions Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * remove unused import Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * use cnn dataset in github ci Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * applied code review Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * simplify interface for data iterator Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * (partial) PP fix Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * TDT compute timestamps option and Extra Whitespace handling for SPE (NVIDIA#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * Basic online dynamic FP8 quantization with vLLM (NVIDIA#10904) * Basic online dynamic quantization with vLLM Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> * vllm 0.6.3 updates Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass quantization param in deploy_vllm_triton.py script Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: janekl <janekl@users.noreply.github.com> * ci: Improve VM maintenance (NVIDIA#10758) * ci: Improve VM maintenance Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * rename stuff Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * title Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * use team Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * run on failure too Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * yrdy Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * test Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Add comment for vision transpose * update megatron_init.py inside lightning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename llama to mllama folder name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update dropout to 0 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * remove disable_vision_padding since we now have a fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update init for mllama Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix copyright title Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix code scan Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update vision code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * revert attention bias changes until latest MLM code got merged Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Turn off system message check, as it's "" now Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Rolllback megatron_parallel.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: janekl <janekl@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
* evian3 update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add aspect ratio in model * support energon dataloader * some pp update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv merging Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix get_key_value_tensors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename files Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to HF style position embedding Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix energon dataloader and support batching * update forward args Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up and move to aspect_ratio_ids Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename back to language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix loss function Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update and fix energon Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add hf import * Fix type * Change config * update energon pretrain Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up * clean up * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update inference files for new code * update to instruct * update to instruct * update few names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer embedding.weight * few fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add hf script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv import * remove interleaved * fixes and updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * lora fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some code clean ups Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update training scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * refactors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add LoRA finetuning * fixes and nemo update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer registering issue by adding 11B and 90B configs * update `decoder_seq_len` Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * science vqa script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up script name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix ckpt save serialization issue * fix predefined config classes * add num_chunks in input Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix format Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update finetuning scripts for PEFT * add 11b recipe (need #10645 to test) * fix mask generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix code style Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Support no image inference * add llama svqa eval * fix masking Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add 90b recipe and revise 11b recipe * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * clean up typing * add option to disable vision padding * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * base model finetuning (does not work yet) * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fixed default conversation template config for MLLama * Update svqa * add multinode * bot happy * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Perf improvements. Mainly from XAttn mask calculation (#10901) * Perf improvements. Mainly from XAttn mask calculation * Apply isort and black reformatting Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> --------- Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> * fix existing issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix lora * few fixes for non image support Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update masking gen Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix data sampler and loading issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add vlm generation * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * generation update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * hide vlm examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "Add vlm generation" This reverts commit 4711c75 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix VisionEncoder multi-batch bug * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * llm.generate fixes (#10983) * fix context path, disable optimizer init, add tp Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * format Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * address comments, require user to provide trainer Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fix Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fixes Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * use __dict__ in check (#11012) * check is_hf_model in leaf module Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * disable getattr alternative path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo; Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * LoRA support for HF::AutoModelForCausalLM (#10982) * add LinearAdapter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add hf lora example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * subclass mixin Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove stale imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix scale Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * regex selector for peft Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move lora Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * hf_auto_model_for_causal_lm finetune recipe Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Change default for always_save_context to True (#11014) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> * Add a build option to load_context (#10713) * Add a build option to load_context Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Adding test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Trying to fix failing CPU test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * cherry-pick fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fix pip install (#11026) * Move AutoTokenizer inline Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move einops to common requirements Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move AutoTokenizer import to top-level again in fine_tuning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move megatron init inside nemo.lightning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Make megatron_lazy_init_context work when transformer-engine is not installed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Only import get_nmt_tokenizer when needed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> * [WIP] Add docs for NEST SSL (#10804) * add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc and fix missing param Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> * Change dist ckpt defaults (#10913) * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * fix ssm tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Make note that ckpt_async_save is disabled for SSMs Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for SSMs with fix Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Disable async ckpt in the peft test as it is a known bug, add note. Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix failing unit tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Ashors/peft async ckpt (#11010) * [WIP] prototype for supporting async checkpointing with peft Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for the peft test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix peft setup test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> --------- Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> * Akoumparouli/mixtral recipe fix r2.0.0 (#10994) * Mixtral TP8 EP1 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Fix _strategy_lib tests (#11033) * fix world size and don't mock Signed-off-by: Maanu Grover <maanug@nvidia.com> * cleanup global state Signed-off-by: Maanu Grover <maanug@nvidia.com> * check app state instead Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix syntax nemo logger test Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> * Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016) * Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)" This reverts commit b5798de. * make megatron sampler return the total number of batches in the dataset Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> * PTQ example for NeMo 2.0 (#10642) * initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * create Quantizer for NeMo 2.0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Call quantize on an unwrapped mcore model Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Add tests, adjust unwrapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix export Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix output_path argument for HF import Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> * fix fabric ckpt loading Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * code review suggestions Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * remove unused import Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * use cnn dataset in github ci Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * applied code review Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * simplify interface for data iterator Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * (partial) PP fix Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * TDT compute timestamps option and Extra Whitespace handling for SPE (#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * Basic online dynamic FP8 quantization with vLLM (#10904) * Basic online dynamic quantization with vLLM Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> * vllm 0.6.3 updates Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass quantization param in deploy_vllm_triton.py script Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: janekl <janekl@users.noreply.github.com> * ci: Improve VM maintenance (#10758) * ci: Improve VM maintenance Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * rename stuff Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * title Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * use team Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * run on failure too Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * yrdy Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * test Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * neva update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add comment for vision transpose * update megatron_init.py inside lightning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix PP Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix test Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * try fix test Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * try fix test Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix megatron megatron_init.py dp Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Update lightning megatron_init.py dp Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * make it possible to update pre_preprocess and post_process for llm, required in vlm Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fixes for neva to run with PP Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add mcore vit support, and checkpoint conversion Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix checkpoint loading for epp Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename llama to mllama folder name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * added datamodule for llava-next * modified state dict transform * neva model changes to support llava-next * remove accidentally checked in files Signed-off-by: Yashaswi Karnati <ykarnati@login-eos01.eos.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * remove unused imports * added io_init to not save task_encoder and image_processor * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * added scripts for pretrain and finetune Signed-off-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * generation example * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * small change in llava next example * llava next end-end train * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * finetune changes * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * finetune debug changes * update dropout to 0 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * added example generation script * added doc strings, formating, remove debug statemens and unsued imports * remove example scripts * fix attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * remove disable_vision_padding since we now have a fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update init for mllama Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix copyright title Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * multiple fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * bug fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix code scan Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix for SP Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update vision code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * revert attention bias changes until latest MLM code got merged Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Turn off system message check, as it's "" now Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update layer spec and add siglip support Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update pretrain script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Fix scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add neva training recipes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix mllama mock ds Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix recipe Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix pp Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * scripts update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * scripts update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update config api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * few updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update 70b Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * hide examples for pr Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix few issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add docstring layer spec Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add docstring to vit config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix copyright Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Signed-off-by: Yashaswi Karnati <ykarnati@login-eos01.eos.clusters.nvidia.com> Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Signed-off-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: janekl <janekl@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Yashaswi Karnati <ykarnati@login-eos01.eos.clusters.nvidia.com> Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
* add initial code for llama vlm Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some restructure Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add mock data placeholder Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix some importing Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add language component for vlm llama * update code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * now match num of params * update language part and fix vision part Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * model can now init Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor update for llama32 text config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * make checkpoint loading work * missing import * match vision part tensor shapes with configs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * solve some fwd issues and mismatch issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add vision import * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update importer to convert both text and image weights * importer typos and reduce clutter * fix import qkv * some fixes for LLM Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add embedding * some updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * enable loading only text or only vision * add example script * TP fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update * upload examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generate Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to newer version Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * upload for sharing * update to new pyt ckpt * xattn_caches matches (except small differences due to TE RMSNorm) * cleanup * embeddings match * match precision of weights * update sharded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * change xattn layer num to 3 7 11 etc * upload llama generation * minor fix * fix dummy layer input format * fix vision qkv order * fix shareded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix vision precision * fix rope * match cross attn layer * remove nrep * Remove cross attention in ImageTransformerLayer and fix _gate_ffn * PP draft Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix intermediate tensor * temp save for pp2 is working Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix pp issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * merge * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * small update to pretrain script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * added energon dataloader for neva training (#10451) * added energon dataloader for neva training * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * specify global batch size to support grad accumulation * adding neva pretrain example * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * change pretraine example to handle new ckpt reloading * fixed code quality warnings and unused imports Signed-off-by: ykarnati <ykarnati@nvidia.com> * minor changes for PR comments * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * refactor conversation template config * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * remove optional import --------- Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> (cherry picked from commit 7354740) * llama energon dataloader * have tokenizer for base task encoder class * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Add simple inference * evian3 update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add aspect ratio in model * support energon dataloader * some pp update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv merging Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix get_key_value_tensors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename files Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to HF style position embedding Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix energon dataloader and support batching * update forward args Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up and move to aspect_ratio_ids Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename back to language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix loss function Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update and fix energon Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add hf import * Fix type * Change config * update energon pretrain Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up * clean up * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update inference files for new code * update to instruct * update to instruct * update few names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer embedding.weight * few fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add hf script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv import * remove interleaved * fixes and updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * lora fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some code clean ups Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update training scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * refactors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add LoRA finetuning * fixes and nemo update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer registering issue by adding 11B and 90B configs * update `decoder_seq_len` Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * science vqa script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up script name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix ckpt save serialization issue * fix predefined config classes * add num_chunks in input Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix format Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update finetuning scripts for PEFT * add 11b recipe (need #10645 to test) * fix mask generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix code style Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Support no image inference * add llama svqa eval * fix masking Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add 90b recipe and revise 11b recipe * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * clean up typing * add option to disable vision padding * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * base model finetuning (does not work yet) * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fixed default conversation template config for MLLama * Update svqa * add multinode * bot happy * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Perf improvements. Mainly from XAttn mask calculation (#10901) * Perf improvements. Mainly from XAttn mask calculation * Apply isort and black reformatting Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> --------- Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> * fix existing issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix lora * few fixes for non image support Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update masking gen Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix data sampler and loading issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add vlm generation * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * generation update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * hide vlm examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "Add vlm generation" This reverts commit 4711c75 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix VisionEncoder multi-batch bug * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * llm.generate fixes (#10983) * fix context path, disable optimizer init, add tp Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * format Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * address comments, require user to provide trainer Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fix Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fixes Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * use __dict__ in check (#11012) * check is_hf_model in leaf module Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * disable getattr alternative path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo; Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * LoRA support for HF::AutoModelForCausalLM (#10982) * add LinearAdapter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add hf lora example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * subclass mixin Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove stale imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix scale Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * regex selector for peft Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move lora Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * hf_auto_model_for_causal_lm finetune recipe Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Change default for always_save_context to True (#11014) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> * Add a build option to load_context (#10713) * Add a build option to load_context Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Adding test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Trying to fix failing CPU test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * cherry-pick fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fix pip install (#11026) * Move AutoTokenizer inline Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move einops to common requirements Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move AutoTokenizer import to top-level again in fine_tuning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move megatron init inside nemo.lightning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Make megatron_lazy_init_context work when transformer-engine is not installed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Only import get_nmt_tokenizer when needed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> * [WIP] Add docs for NEST SSL (#10804) * add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc and fix missing param Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> * Change dist ckpt defaults (#10913) * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * fix ssm tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Make note that ckpt_async_save is disabled for SSMs Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for SSMs with fix Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Disable async ckpt in the peft test as it is a known bug, add note. Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix failing unit tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Ashors/peft async ckpt (#11010) * [WIP] prototype for supporting async checkpointing with peft Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for the peft test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix peft setup test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> --------- Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> * Akoumparouli/mixtral recipe fix r2.0.0 (#10994) * Mixtral TP8 EP1 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Fix _strategy_lib tests (#11033) * fix world size and don't mock Signed-off-by: Maanu Grover <maanug@nvidia.com> * cleanup global state Signed-off-by: Maanu Grover <maanug@nvidia.com> * check app state instead Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix syntax nemo logger test Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> * Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016) * Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)" This reverts commit b5798de. * make megatron sampler return the total number of batches in the dataset Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> * PTQ example for NeMo 2.0 (#10642) * initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * create Quantizer for NeMo 2.0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Call quantize on an unwrapped mcore model Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Add tests, adjust unwrapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix export Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix output_path argument for HF import Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> * fix fabric ckpt loading Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * code review suggestions Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * remove unused import Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * use cnn dataset in github ci Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * applied code review Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * simplify interface for data iterator Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * (partial) PP fix Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * TDT compute timestamps option and Extra Whitespace handling for SPE (#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * Basic online dynamic FP8 quantization with vLLM (#10904) * Basic online dynamic quantization with vLLM Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> * vllm 0.6.3 updates Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass quantization param in deploy_vllm_triton.py script Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: janekl <janekl@users.noreply.github.com> * ci: Improve VM maintenance (#10758) * ci: Improve VM maintenance Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * rename stuff Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * title Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * use team Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * run on failure too Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * yrdy Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * test Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Add comment for vision transpose * update megatron_init.py inside lightning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename llama to mllama folder name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update dropout to 0 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * remove disable_vision_padding since we now have a fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update init for mllama Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix copyright title Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix code scan Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update vision code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * revert attention bias changes until latest MLM code got merged Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Turn off system message check, as it's "" now Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Rolllback megatron_parallel.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: janekl <janekl@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
What does this PR do ?
This PR includes option for getting timestamps from TDT Decoder head, including on a segment level. This option is enabled for RNNT and CTC decodings as well. And also introduces workaround for getting tokens and id_s for extra whitespaces (even if the tokenizer was not trained in this way.) This is crucial for getting alignments in the future.
Collection: asr, common
Changelog
Usage
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information