Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add LLama32 Vision Model Support in Nemo 2.0 (#10763)
* add initial code for llama vlm Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some restructure Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add mock data placeholder Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix some importing Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add language component for vlm llama * update code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * now match num of params * update language part and fix vision part Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * model can now init Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor update for llama32 text config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * make checkpoint loading work * missing import * match vision part tensor shapes with configs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * solve some fwd issues and mismatch issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add vision import * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update importer to convert both text and image weights * importer typos and reduce clutter * fix import qkv * some fixes for LLM Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add embedding * some updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * enable loading only text or only vision * add example script * TP fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update * upload examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generate Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to newer version Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * upload for sharing * update to new pyt ckpt * xattn_caches matches (except small differences due to TE RMSNorm) * cleanup * embeddings match * match precision of weights * update sharded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * change xattn layer num to 3 7 11 etc * upload llama generation * minor fix * fix dummy layer input format * fix vision qkv order * fix shareded state dict Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix vision precision * fix rope * match cross attn layer * remove nrep * Remove cross attention in ImageTransformerLayer and fix _gate_ffn * PP draft Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix intermediate tensor * temp save for pp2 is working Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix pp issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * merge * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * small update to pretrain script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * added energon dataloader for neva training (#10451) * added energon dataloader for neva training * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * specify global batch size to support grad accumulation * adding neva pretrain example * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * change pretraine example to handle new ckpt reloading * fixed code quality warnings and unused imports Signed-off-by: ykarnati <ykarnati@nvidia.com> * minor changes for PR comments * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * refactor conversation template config * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * remove optional import --------- Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> (cherry picked from commit 7354740) * llama energon dataloader * have tokenizer for base task encoder class * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Add simple inference * evian3 update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add aspect ratio in model * support energon dataloader * some pp update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv merging Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix get_key_value_tensors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename files Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to HF style position embedding Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix energon dataloader and support batching * update forward args Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up and move to aspect_ratio_ids Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename back to language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix loss function Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update and fix energon Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add hf import * Fix type * Change config * update energon pretrain Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up * clean up * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update inference files for new code * update to instruct * update to instruct * update few names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer embedding.weight * few fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add hf script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix kv import * remove interleaved * fixes and updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * lora fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * some code clean ups Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update training scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * refactors Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add LoRA finetuning * fixes and nemo update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix importer registering issue by adding 11B and 90B configs * update `decoder_seq_len` Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * science vqa script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clean up script name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix ckpt save serialization issue * fix predefined config classes * add num_chunks in input Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix format Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update finetuning scripts for PEFT * add 11b recipe (need #10645 to test) * fix mask generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * minor fix code style Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Support no image inference * add llama svqa eval * fix masking Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix generation Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add 90b recipe and revise 11b recipe * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * clean up typing * add option to disable vision padding * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * base model finetuning (does not work yet) * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fixed default conversation template config for MLLama * Update svqa * add multinode * bot happy * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Perf improvements. Mainly from XAttn mask calculation (#10901) * Perf improvements. Mainly from XAttn mask calculation * Apply isort and black reformatting Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> --------- Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> * fix existing issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix lora * few fixes for non image support Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update masking gen Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix data sampler and loading issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add vlm generation * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * generation update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update lazy dataset Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * hide vlm examples Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "Add vlm generation" This reverts commit 4711c75 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix VisionEncoder multi-batch bug * update mcore parallelism initialization Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update megatron_init.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * add encoder parallel default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix _strategy_lib.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * llm.generate fixes (#10983) * fix context path, disable optimizer init, add tp Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * format Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * address comments, require user to provide trainer Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fix Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * minor fixes Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * use __dict__ in check (#11012) * check is_hf_model in leaf module Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * disable getattr alternative path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo; Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * LoRA support for HF::AutoModelForCausalLM (#10982) * add LinearAdapter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add hf lora example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * subclass mixin Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove stale imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix scale Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * regex selector for peft Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move lora Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * hf_auto_model_for_causal_lm finetune recipe Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Change default for always_save_context to True (#11014) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> * Add a build option to load_context (#10713) * Add a build option to load_context Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Adding test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Trying to fix failing CPU test Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * cherry-pick fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fix pip install (#11026) * Move AutoTokenizer inline Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move einops to common requirements Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move AutoTokenizer import to top-level again in fine_tuning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Move megatron init inside nemo.lightning Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Make megatron_lazy_init_context work when transformer-engine is not installed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Only import get_nmt_tokenizer when needed Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> * [WIP] Add docs for NEST SSL (#10804) * add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc and fix missing param Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> * Change dist ckpt defaults (#10913) * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * fix ssm tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Make note that ckpt_async_save is disabled for SSMs Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for SSMs with fix Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Disable async ckpt in the peft test as it is a known bug, add note. Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix failing unit tests Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Ashors/peft async ckpt (#11010) * [WIP] prototype for supporting async checkpointing with peft Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Enable async ckpt for the peft test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Fix peft setup test Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> --------- Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> * Akoumparouli/mixtral recipe fix r2.0.0 (#10994) * Mixtral TP8 EP1 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Fix _strategy_lib tests (#11033) * fix world size and don't mock Signed-off-by: Maanu Grover <maanug@nvidia.com> * cleanup global state Signed-off-by: Maanu Grover <maanug@nvidia.com> * check app state instead Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix syntax nemo logger test Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> * Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016) * Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)" This reverts commit b5798de. * make megatron sampler return the total number of batches in the dataset Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> * PTQ example for NeMo 2.0 (#10642) * initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * create Quantizer for NeMo 2.0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Call quantize on an unwrapped mcore model Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Add tests, adjust unwrapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix export Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: artbataev <artbataev@users.noreply.github.com> * Fix output_path argument for HF import Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> * fix fabric ckpt loading Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * code review suggestions Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * remove unused import Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * use cnn dataset in github ci Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * applied code review Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * simplify interface for data iterator Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * (partial) PP fix Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * TDT compute timestamps option and Extra Whitespace handling for SPE (#10875) * add token duration Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * revert rnnt change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add token duration retrieval Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix config field name Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * updated doc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * fix of unicode char Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * modify segments formation Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify segments for ctc Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * fix in ctc refinement Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * reverse offset change Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * warning mode=once Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * minor changes Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * adjust changes to the tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> --------- Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * Basic online dynamic FP8 quantization with vLLM (#10904) * Basic online dynamic quantization with vLLM Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> * vllm 0.6.3 updates Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Pass quantization param in deploy_vllm_triton.py script Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: janekl <janekl@users.noreply.github.com> * ci: Improve VM maintenance (#10758) * ci: Improve VM maintenance Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * rename stuff Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * title Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * use team Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * run on failure too Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * yrdy Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * test Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * f Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Add comment for vision transpose * update megatron_init.py inside lightning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * rename llama to mllama folder name Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update dropout to 0 Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * remove disable_vision_padding since we now have a fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update init for mllama Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix copyright title Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix code scan Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update vision code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * revert attention bias changes until latest MLM code got merged Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix warning Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Turn off system message check, as it's "" now Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Rolllback megatron_parallel.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: parthmannan <parthmannan@users.noreply.github.com> Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Marc Romeijn <mromeijn@nvidia.com> Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Signed-off-by: monica-sekoyan <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com> Co-authored-by: parthmannan <parthmannan@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: janekl <janekl@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
- Loading branch information