Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTQ example for NeMo 2.0 #10642

Merged
merged 32 commits into from
Oct 25, 2024
Merged

PTQ example for NeMo 2.0 #10642

merged 32 commits into from
Oct 25, 2024

Conversation

Laplasjan107
Copy link
Collaborator

@Laplasjan107 Laplasjan107 commented Sep 26, 2024

What does this PR do ?

Example workflow for quantizing NeMo 2.0 checkpoint.

Collection: llm

Known Issues

  • Observed accuracy regression on MMLU when exporting a non-quantized LLAMA3-7B: 0.654 -> 0.638
  • Half-precision with PP>1 does not work

Usage

  • You can potentially add a usage example below
from nemo.collections.llm import quantization

quantization_config = quantization.QuantizationConfig()
export_config = quantization.ExportConfig('/tmp/trt_llama_engine')
quantizer = quantization.Quantizer(quantization_config, export_config)
model = quantization.load_with_modelopt_layer_spec('/path/to/nemo/checkpoint')
model = quantizer.quantize(model)
quantizer.export(model)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Piotr Kaminski and others added 3 commits September 30, 2024 10:07
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
@Laplasjan107 Laplasjan107 marked this pull request as ready for review October 4, 2024 14:28
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
nemo/lightning/io/api.py Fixed Show fixed Hide fixed
tests/collections/llm/test_hf_import.py Fixed Show fixed Hide fixed
Piotr Kaminski and others added 3 commits October 15, 2024 01:03
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
nemo/collections/common/parts/run_utils.py Dismissed Show dismissed Hide dismissed
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
@janekl janekl added Run CICD and removed Run CICD labels Oct 16, 2024
Comment on lines 5452 to +5454

L2_NeMo_2_PTQ_Llama2_FP8:
needs: [cicd-test-container-setup]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a friendly reminder to make sure it doesn't get lost once this PR is ready: this job name also needs to be added to CICD_Nemo_Test like the others

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@janekl janekl added the r2.0.0 label Oct 16, 2024


@dataclass
class QuantizationConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add doc-string



@dataclass
class ExportConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add doc-string

Piotr Kaminski and others added 7 commits October 23, 2024 03:19
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
@janekl janekl added Run CICD and removed Run CICD labels Oct 23, 2024
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Oct 24, 2024
@janekl janekl enabled auto-merge (squash) October 24, 2024 15:39
@janekl janekl added Run CICD and removed Run CICD labels Oct 25, 2024
@janekl janekl merged commit 83eea56 into NVIDIA:main Oct 25, 2024
152 of 154 checks passed
yaoyu-33 pushed a commit that referenced this pull request Oct 25, 2024
* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
titu1994 pushed a commit that referenced this pull request Oct 28, 2024
* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
yaoyu-33 added a commit that referenced this pull request Nov 8, 2024
* add initial code for llama vlm

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some restructure

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add mock data placeholder

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix some importing

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add language component for vlm llama

* update code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* now match num of params

* update language part and fix vision part

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* model can now init

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor update for llama32 text config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* make checkpoint loading work

* missing import

* match vision part tensor shapes with configs

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* solve some fwd issues and mismatch issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add vision import

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update importer to convert both text and image weights

* importer typos and reduce clutter

* fix import qkv

* some fixes for LLM

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add embedding

* some updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* enable loading only text or only vision

* add example script

* TP fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update

* upload examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generate

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to newer version

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* upload for sharing

* update to new pyt ckpt

* xattn_caches matches (except small differences due to TE RMSNorm)

* cleanup

* embeddings match

* match precision of weights

* update sharded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* change xattn layer num to 3 7 11 etc

* upload llama generation

* minor fix

* fix dummy layer input format

* fix vision qkv order

* fix shareded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix vision precision

* fix rope

* match cross attn layer

* remove nrep

* Remove cross attention in ImageTransformerLayer and fix _gate_ffn

* PP draft

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix intermediate tensor

* temp save for pp2 is working

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix pp issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* merge

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* small update to pretrain script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* added energon dataloader for neva training (#10451)

* added energon dataloader for neva training

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* specify global batch size to support grad accumulation

* adding neva pretrain example

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* change pretraine example to handle new ckpt reloading

* fixed code quality warnings and unused imports

Signed-off-by: ykarnati <ykarnati@nvidia.com>

* minor changes for PR comments

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* refactor conversation template config

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* remove optional import

---------

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
(cherry picked from commit 7354740)

* llama energon dataloader

* have tokenizer for base task encoder class

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Add simple inference

* evian3 update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add aspect ratio in model

* support energon dataloader

* some pp update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv merging

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix get_key_value_tensors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename files

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to HF style position embedding

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix energon dataloader and support batching

* update forward args

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up and move to aspect_ratio_ids

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename back to language.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix loss function

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update and fix energon

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add hf import

* Fix type

* Change config

* update energon pretrain

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

* clean up

* reformat

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update inference files for new code

* update to instruct

* update to instruct

* update few names

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer embedding.weight

* few fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add hf script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv import

* remove interleaved

* fixes and updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* lora fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some code clean ups

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update training scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* refactors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add LoRA finetuning

* fixes and nemo update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer registering issue by adding 11B and 90B configs

* update `decoder_seq_len`

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* science vqa script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up script name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix ckpt save serialization issue

* fix predefined config classes

* add num_chunks in input

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix format

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update finetuning scripts for PEFT

* add 11b recipe (need #10645 to test)

* fix mask generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix code style

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Support no image inference

* add llama svqa eval

* fix masking

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* add 90b recipe and revise 11b recipe

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* clean up typing

* add option to disable vision padding

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* base model finetuning (does not work yet)

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fixed default conversation template config for MLLama

* Update svqa

* add multinode

* bot happy

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Perf improvements. Mainly from XAttn mask calculation (#10901)

* Perf improvements. Mainly from XAttn mask calculation

* Apply isort and black reformatting

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>

---------

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>

* fix existing issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix lora

* few fixes for non image support

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update masking gen

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix data sampler and loading issue

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add vlm generation

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* generation update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* hide vlm examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Revert "Add vlm generation"

This reverts commit 4711c75

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix VisionEncoder multi-batch bug

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* llm.generate fixes (#10983)

* fix context path, disable optimizer init, add tp

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* format

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* address comments, require user to provide trainer

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fix

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fixes

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* use __dict__ in check (#11012)

* check is_hf_model in leaf module

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* disable getattr alternative path

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo;

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* LoRA support for HF::AutoModelForCausalLM (#10982)

* add LinearAdapter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add hf lora example

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove unused imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass mixin

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove stale imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix scale

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* regex selector for peft

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move lora

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* hf_auto_model_for_causal_lm finetune recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Change default for always_save_context to True (#11014)

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>

* Add a build option to load_context (#10713)

* Add a build option to load_context

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Adding test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Trying to fix failing CPU test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cherry-pick fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix pip install (#11026)

* Move AutoTokenizer inline

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move einops to common requirements

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move AutoTokenizer import to top-level again in fine_tuning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move megatron init inside nemo.lightning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Make megatron_lazy_init_context work when transformer-engine is not installed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Only import get_nmt_tokenizer when needed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>

* [WIP] Add docs for NEST SSL (#10804)

* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc and fix missing param

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Change dist ckpt defaults (#10913)

* Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* fix ssm tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Make note that ckpt_async_save is disabled for SSMs

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for SSMs with fix

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Disable async ckpt in the peft test as it is a known bug, add note.

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix failing unit tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Ashors/peft async ckpt (#11010)

* [WIP] prototype for supporting async checkpointing with peft

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for the peft test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix peft setup test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

---------

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* Akoumparouli/mixtral recipe fix r2.0.0 (#10994)

* Mixtral TP8 EP1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Fix _strategy_lib tests (#11033)

* fix world size and don't mock

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* cleanup global state

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* check app state instead

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* fix syntax nemo logger test

Signed-off-by: Maanu Grover <maanug@nvidia.com>

---------

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016)

* Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)"

This reverts commit b5798de.

* make megatron sampler return the total number of batches in the dataset

Signed-off-by: ashors1 <ashors@nvidia.com>

---------

Signed-off-by: ashors1 <ashors@nvidia.com>

* PTQ example for NeMo 2.0 (#10642)

* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* TDT compute timestamps option and Extra Whitespace handling for SPE (#10875)

* add token duration

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* revert rnnt change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add remove_extra_whitespaces arg to spe tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add token duration retrieval

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add ignore_extra_whitespace to spe

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add compute_timestamp support for tdt

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix config field name

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add refinement for tdt timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add segments timestamp support and  refinement for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify tests for ctc decoding timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add rnnt timestamp tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* updated doc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* fix of unicode char

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix rnnt_decoding test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* workaround for tesst tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* modify segments formation

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify segments for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in ctc refinement

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* reverse offset change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* warning mode=once

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* make ignore_extrawhitespaces false

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* adjust changes to the tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify prompt_formatter tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

---------

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Basic online dynamic FP8 quantization with vLLM (#10904)

* Basic online dynamic quantization with vLLM

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Apply isort and black reformatting

Signed-off-by: janekl <janekl@users.noreply.github.com>

* vllm 0.6.3 updates

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Pass quantization param in deploy_vllm_triton.py script

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>

* ci: Improve VM maintenance (#10758)

* ci: Improve VM maintenance

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* rename stuff

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* title

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* use team

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* run on failure too

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* yrdy

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* test

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Add comment for vision transpose

* update megatron_init.py inside lightning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename llama to mllama folder name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update dropout to 0

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* remove disable_vision_padding since we now have a fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update init for mllama

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Address comments

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix copyright title

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix code scan

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update vision code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* revert attention bias changes until latest MLM code got merged

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Turn off system message check, as it's "" now

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Rolllback megatron_parallel.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Bobby Chen <bobchen@nvidia.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: meatybobby <meatybobby@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
lilyw97 pushed a commit to lilyw97/NeMo that referenced this pull request Nov 13, 2024
* add initial code for llama vlm

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some restructure

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add mock data placeholder

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix some importing

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add language component for vlm llama

* update code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* now match num of params

* update language part and fix vision part

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* model can now init

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor update for llama32 text config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* make checkpoint loading work

* missing import

* match vision part tensor shapes with configs

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* solve some fwd issues and mismatch issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add vision import

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update importer to convert both text and image weights

* importer typos and reduce clutter

* fix import qkv

* some fixes for LLM

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add embedding

* some updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* enable loading only text or only vision

* add example script

* TP fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update

* upload examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generate

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to newer version

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* upload for sharing

* update to new pyt ckpt

* xattn_caches matches (except small differences due to TE RMSNorm)

* cleanup

* embeddings match

* match precision of weights

* update sharded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* change xattn layer num to 3 7 11 etc

* upload llama generation

* minor fix

* fix dummy layer input format

* fix vision qkv order

* fix shareded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix vision precision

* fix rope

* match cross attn layer

* remove nrep

* Remove cross attention in ImageTransformerLayer and fix _gate_ffn

* PP draft

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix intermediate tensor

* temp save for pp2 is working

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix pp issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* merge

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* small update to pretrain script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* added energon dataloader for neva training (NVIDIA#10451)

* added energon dataloader for neva training

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* specify global batch size to support grad accumulation

* adding neva pretrain example

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* change pretraine example to handle new ckpt reloading

* fixed code quality warnings and unused imports

Signed-off-by: ykarnati <ykarnati@nvidia.com>

* minor changes for PR comments

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* refactor conversation template config

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* remove optional import

---------

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
(cherry picked from commit 7354740)

* llama energon dataloader

* have tokenizer for base task encoder class

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Add simple inference

* evian3 update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add aspect ratio in model

* support energon dataloader

* some pp update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv merging

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix get_key_value_tensors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename files

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to HF style position embedding

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix energon dataloader and support batching

* update forward args

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up and move to aspect_ratio_ids

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename back to language.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix loss function

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update and fix energon

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add hf import

* Fix type

* Change config

* update energon pretrain

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

* clean up

* reformat

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update inference files for new code

* update to instruct

* update to instruct

* update few names

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer embedding.weight

* few fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add hf script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv import

* remove interleaved

* fixes and updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* lora fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some code clean ups

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update training scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* refactors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add LoRA finetuning

* fixes and nemo update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer registering issue by adding 11B and 90B configs

* update `decoder_seq_len`

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* science vqa script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up script name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix ckpt save serialization issue

* fix predefined config classes

* add num_chunks in input

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix format

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update finetuning scripts for PEFT

* add 11b recipe (need NVIDIA#10645 to test)

* fix mask generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix code style

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Support no image inference

* add llama svqa eval

* fix masking

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* add 90b recipe and revise 11b recipe

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* clean up typing

* add option to disable vision padding

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* base model finetuning (does not work yet)

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fixed default conversation template config for MLLama

* Update svqa

* add multinode

* bot happy

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Perf improvements. Mainly from XAttn mask calculation (NVIDIA#10901)

* Perf improvements. Mainly from XAttn mask calculation

* Apply isort and black reformatting

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>

---------

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>

* fix existing issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix lora

* few fixes for non image support

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update masking gen

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix data sampler and loading issue

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add vlm generation

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* generation update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* hide vlm examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Revert "Add vlm generation"

This reverts commit 4711c75

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix VisionEncoder multi-batch bug

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* llm.generate fixes (NVIDIA#10983)

* fix context path, disable optimizer init, add tp

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* format

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* address comments, require user to provide trainer

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fix

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fixes

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* use __dict__ in check (NVIDIA#11012)

* check is_hf_model in leaf module

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* disable getattr alternative path

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo;

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* LoRA support for HF::AutoModelForCausalLM (NVIDIA#10982)

* add LinearAdapter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add hf lora example

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove unused imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass mixin

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove stale imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix scale

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* regex selector for peft

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move lora

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* hf_auto_model_for_causal_lm finetune recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Change default for always_save_context to True (NVIDIA#11014)

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>

* Add a build option to load_context (NVIDIA#10713)

* Add a build option to load_context

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Adding test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Trying to fix failing CPU test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cherry-pick fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix pip install (NVIDIA#11026)

* Move AutoTokenizer inline

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move einops to common requirements

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move AutoTokenizer import to top-level again in fine_tuning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move megatron init inside nemo.lightning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Make megatron_lazy_init_context work when transformer-engine is not installed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Only import get_nmt_tokenizer when needed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>

* [WIP] Add docs for NEST SSL (NVIDIA#10804)

* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc and fix missing param

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Change dist ckpt defaults (NVIDIA#10913)

* Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* fix ssm tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Make note that ckpt_async_save is disabled for SSMs

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for SSMs with fix

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Disable async ckpt in the peft test as it is a known bug, add note.

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix failing unit tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Ashors/peft async ckpt (NVIDIA#11010)

* [WIP] prototype for supporting async checkpointing with peft

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for the peft test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix peft setup test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

---------

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* Akoumparouli/mixtral recipe fix r2.0.0 (NVIDIA#10994)

* Mixtral TP8 EP1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Fix _strategy_lib tests (NVIDIA#11033)

* fix world size and don't mock

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* cleanup global state

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* check app state instead

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* fix syntax nemo logger test

Signed-off-by: Maanu Grover <maanug@nvidia.com>

---------

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (NVIDIA#11016)

* Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (NVIDIA#10383)"

This reverts commit b5798de.

* make megatron sampler return the total number of batches in the dataset

Signed-off-by: ashors1 <ashors@nvidia.com>

---------

Signed-off-by: ashors1 <ashors@nvidia.com>

* PTQ example for NeMo 2.0 (NVIDIA#10642)

* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* TDT compute timestamps option and Extra Whitespace handling for SPE (NVIDIA#10875)

* add token duration

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* revert rnnt change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add remove_extra_whitespaces arg to spe tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add token duration retrieval

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add ignore_extra_whitespace to spe

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add compute_timestamp support for tdt

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix config field name

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add refinement for tdt timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add segments timestamp support and  refinement for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify tests for ctc decoding timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add rnnt timestamp tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* updated doc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* fix of unicode char

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix rnnt_decoding test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* workaround for tesst tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* modify segments formation

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify segments for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in ctc refinement

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* reverse offset change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* warning mode=once

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* make ignore_extrawhitespaces false

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* adjust changes to the tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify prompt_formatter tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

---------

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Basic online dynamic FP8 quantization with vLLM (NVIDIA#10904)

* Basic online dynamic quantization with vLLM

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Apply isort and black reformatting

Signed-off-by: janekl <janekl@users.noreply.github.com>

* vllm 0.6.3 updates

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Pass quantization param in deploy_vllm_triton.py script

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>

* ci: Improve VM maintenance (NVIDIA#10758)

* ci: Improve VM maintenance

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* rename stuff

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* title

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* use team

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* run on failure too

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* yrdy

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* test

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Add comment for vision transpose

* update megatron_init.py inside lightning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename llama to mllama folder name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update dropout to 0

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* remove disable_vision_padding since we now have a fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update init for mllama

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Address comments

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix copyright title

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix code scan

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update vision code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* revert attention bias changes until latest MLM code got merged

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Turn off system message check, as it's "" now

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Rolllback megatron_parallel.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Bobby Chen <bobchen@nvidia.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: meatybobby <meatybobby@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
HuiyingLi pushed a commit to HuiyingLi/NeMo that referenced this pull request Nov 15, 2024
* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
HuiyingLi added a commit to HuiyingLi/NeMo that referenced this pull request Nov 15, 2024
* add initial code for llama vlm

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some restructure

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add mock data placeholder

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix some importing

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add language component for vlm llama

* update code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* now match num of params

* update language part and fix vision part

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* model can now init

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor update for llama32 text config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* make checkpoint loading work

* missing import

* match vision part tensor shapes with configs

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* solve some fwd issues and mismatch issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add vision import

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update importer to convert both text and image weights

* importer typos and reduce clutter

* fix import qkv

* some fixes for LLM

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add embedding

* some updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* enable loading only text or only vision

* add example script

* TP fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update

* upload examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generate

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to newer version

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* upload for sharing

* update to new pyt ckpt

* xattn_caches matches (except small differences due to TE RMSNorm)

* cleanup

* embeddings match

* match precision of weights

* update sharded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* change xattn layer num to 3 7 11 etc

* upload llama generation

* minor fix

* fix dummy layer input format

* fix vision qkv order

* fix shareded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix vision precision

* fix rope

* match cross attn layer

* remove nrep

* Remove cross attention in ImageTransformerLayer and fix _gate_ffn

* PP draft

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix intermediate tensor

* temp save for pp2 is working

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix pp issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* merge

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* small update to pretrain script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* added energon dataloader for neva training (NVIDIA#10451)

* added energon dataloader for neva training

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* specify global batch size to support grad accumulation

* adding neva pretrain example

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* change pretraine example to handle new ckpt reloading

* fixed code quality warnings and unused imports

Signed-off-by: ykarnati <ykarnati@nvidia.com>

* minor changes for PR comments

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* refactor conversation template config

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* remove optional import

---------

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
(cherry picked from commit 7354740)

* llama energon dataloader

* have tokenizer for base task encoder class

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Add simple inference

* evian3 update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add aspect ratio in model

* support energon dataloader

* some pp update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv merging

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix get_key_value_tensors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename files

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to HF style position embedding

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix energon dataloader and support batching

* update forward args

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up and move to aspect_ratio_ids

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename back to language.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix loss function

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update and fix energon

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add hf import

* Fix type

* Change config

* update energon pretrain

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

* clean up

* reformat

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update inference files for new code

* update to instruct

* update to instruct

* update few names

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer embedding.weight

* few fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add hf script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv import

* remove interleaved

* fixes and updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* lora fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some code clean ups

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update training scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* refactors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add LoRA finetuning

* fixes and nemo update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer registering issue by adding 11B and 90B configs

* update `decoder_seq_len`

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* science vqa script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up script name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix ckpt save serialization issue

* fix predefined config classes

* add num_chunks in input

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix format

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update finetuning scripts for PEFT

* add 11b recipe (need NVIDIA#10645 to test)

* fix mask generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix code style

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Support no image inference

* add llama svqa eval

* fix masking

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* add 90b recipe and revise 11b recipe

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* clean up typing

* add option to disable vision padding

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* base model finetuning (does not work yet)

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fixed default conversation template config for MLLama

* Update svqa

* add multinode

* bot happy

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Perf improvements. Mainly from XAttn mask calculation (NVIDIA#10901)

* Perf improvements. Mainly from XAttn mask calculation

* Apply isort and black reformatting

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>

---------

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>

* fix existing issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix lora

* few fixes for non image support

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update masking gen

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix data sampler and loading issue

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add vlm generation

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* generation update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* hide vlm examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Revert "Add vlm generation"

This reverts commit 4711c75

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix VisionEncoder multi-batch bug

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* llm.generate fixes (NVIDIA#10983)

* fix context path, disable optimizer init, add tp

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* format

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* address comments, require user to provide trainer

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fix

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fixes

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* use __dict__ in check (NVIDIA#11012)

* check is_hf_model in leaf module

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* disable getattr alternative path

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo;

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* LoRA support for HF::AutoModelForCausalLM (NVIDIA#10982)

* add LinearAdapter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add hf lora example

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove unused imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass mixin

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove stale imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix scale

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* regex selector for peft

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move lora

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* hf_auto_model_for_causal_lm finetune recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Change default for always_save_context to True (NVIDIA#11014)

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>

* Add a build option to load_context (NVIDIA#10713)

* Add a build option to load_context

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Adding test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Trying to fix failing CPU test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cherry-pick fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix pip install (NVIDIA#11026)

* Move AutoTokenizer inline

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move einops to common requirements

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move AutoTokenizer import to top-level again in fine_tuning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move megatron init inside nemo.lightning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Make megatron_lazy_init_context work when transformer-engine is not installed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Only import get_nmt_tokenizer when needed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>

* [WIP] Add docs for NEST SSL (NVIDIA#10804)

* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc and fix missing param

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Change dist ckpt defaults (NVIDIA#10913)

* Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* fix ssm tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Make note that ckpt_async_save is disabled for SSMs

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for SSMs with fix

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Disable async ckpt in the peft test as it is a known bug, add note.

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix failing unit tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Ashors/peft async ckpt (NVIDIA#11010)

* [WIP] prototype for supporting async checkpointing with peft

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for the peft test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix peft setup test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

---------

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* Akoumparouli/mixtral recipe fix r2.0.0 (NVIDIA#10994)

* Mixtral TP8 EP1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Fix _strategy_lib tests (NVIDIA#11033)

* fix world size and don't mock

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* cleanup global state

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* check app state instead

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* fix syntax nemo logger test

Signed-off-by: Maanu Grover <maanug@nvidia.com>

---------

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (NVIDIA#11016)

* Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (NVIDIA#10383)"

This reverts commit b5798de.

* make megatron sampler return the total number of batches in the dataset

Signed-off-by: ashors1 <ashors@nvidia.com>

---------

Signed-off-by: ashors1 <ashors@nvidia.com>

* PTQ example for NeMo 2.0 (NVIDIA#10642)

* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* TDT compute timestamps option and Extra Whitespace handling for SPE (NVIDIA#10875)

* add token duration

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* revert rnnt change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add remove_extra_whitespaces arg to spe tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add token duration retrieval

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add ignore_extra_whitespace to spe

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add compute_timestamp support for tdt

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix config field name

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add refinement for tdt timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add segments timestamp support and  refinement for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify tests for ctc decoding timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add rnnt timestamp tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* updated doc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* fix of unicode char

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix rnnt_decoding test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* workaround for tesst tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* modify segments formation

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify segments for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in ctc refinement

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* reverse offset change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* warning mode=once

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* make ignore_extrawhitespaces false

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* adjust changes to the tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify prompt_formatter tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

---------

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Basic online dynamic FP8 quantization with vLLM (NVIDIA#10904)

* Basic online dynamic quantization with vLLM

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Apply isort and black reformatting

Signed-off-by: janekl <janekl@users.noreply.github.com>

* vllm 0.6.3 updates

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Pass quantization param in deploy_vllm_triton.py script

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>

* ci: Improve VM maintenance (NVIDIA#10758)

* ci: Improve VM maintenance

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* rename stuff

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* title

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* use team

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* run on failure too

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* yrdy

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* test

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Add comment for vision transpose

* update megatron_init.py inside lightning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename llama to mllama folder name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update dropout to 0

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* remove disable_vision_padding since we now have a fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update init for mllama

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Address comments

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix copyright title

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix code scan

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update vision code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* revert attention bias changes until latest MLM code got merged

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Turn off system message check, as it's "" now

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Rolllback megatron_parallel.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Bobby Chen <bobchen@nvidia.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: meatybobby <meatybobby@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
yaoyu-33 added a commit that referenced this pull request Nov 21, 2024
* evian3 update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add aspect ratio in model

* support energon dataloader

* some pp update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv merging

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix get_key_value_tensors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename files

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to HF style position embedding

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix energon dataloader and support batching

* update forward args

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up and move to aspect_ratio_ids

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename back to language.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix loss function

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update and fix energon

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add hf import

* Fix type

* Change config

* update energon pretrain

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

* clean up

* reformat

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update inference files for new code

* update to instruct

* update to instruct

* update few names

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer embedding.weight

* few fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add hf script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv import

* remove interleaved

* fixes and updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* lora fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some code clean ups

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update training scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* refactors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add LoRA finetuning

* fixes and nemo update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer registering issue by adding 11B and 90B configs

* update `decoder_seq_len`

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* science vqa script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up script name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix ckpt save serialization issue

* fix predefined config classes

* add num_chunks in input

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix format

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update finetuning scripts for PEFT

* add 11b recipe (need #10645 to test)

* fix mask generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix code style

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Support no image inference

* add llama svqa eval

* fix masking

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* add 90b recipe and revise 11b recipe

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* clean up typing

* add option to disable vision padding

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* base model finetuning (does not work yet)

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fixed default conversation template config for MLLama

* Update svqa

* add multinode

* bot happy

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Perf improvements. Mainly from XAttn mask calculation (#10901)

* Perf improvements. Mainly from XAttn mask calculation

* Apply isort and black reformatting

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>

---------

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>

* fix existing issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix lora

* few fixes for non image support

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update masking gen

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix data sampler and loading issue

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add vlm generation

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* generation update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* hide vlm examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Revert "Add vlm generation"

This reverts commit 4711c75

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix VisionEncoder multi-batch bug

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* llm.generate fixes (#10983)

* fix context path, disable optimizer init, add tp

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* format

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* address comments, require user to provide trainer

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fix

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fixes

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* use __dict__ in check (#11012)

* check is_hf_model in leaf module

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* disable getattr alternative path

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo;

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* LoRA support for HF::AutoModelForCausalLM (#10982)

* add LinearAdapter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add hf lora example

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove unused imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass mixin

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove stale imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix scale

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* regex selector for peft

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move lora

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* hf_auto_model_for_causal_lm finetune recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Change default for always_save_context to True (#11014)

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>

* Add a build option to load_context (#10713)

* Add a build option to load_context

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Adding test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Trying to fix failing CPU test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cherry-pick fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix pip install (#11026)

* Move AutoTokenizer inline

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move einops to common requirements

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move AutoTokenizer import to top-level again in fine_tuning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move megatron init inside nemo.lightning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Make megatron_lazy_init_context work when transformer-engine is not installed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Only import get_nmt_tokenizer when needed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>

* [WIP] Add docs for NEST SSL (#10804)

* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc and fix missing param

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Change dist ckpt defaults (#10913)

* Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* fix ssm tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Make note that ckpt_async_save is disabled for SSMs

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for SSMs with fix

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Disable async ckpt in the peft test as it is a known bug, add note.

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix failing unit tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Ashors/peft async ckpt (#11010)

* [WIP] prototype for supporting async checkpointing with peft

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for the peft test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix peft setup test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

---------

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* Akoumparouli/mixtral recipe fix r2.0.0 (#10994)

* Mixtral TP8 EP1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Fix _strategy_lib tests (#11033)

* fix world size and don't mock

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* cleanup global state

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* check app state instead

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* fix syntax nemo logger test

Signed-off-by: Maanu Grover <maanug@nvidia.com>

---------

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016)

* Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)"

This reverts commit b5798de.

* make megatron sampler return the total number of batches in the dataset

Signed-off-by: ashors1 <ashors@nvidia.com>

---------

Signed-off-by: ashors1 <ashors@nvidia.com>

* PTQ example for NeMo 2.0 (#10642)

* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* TDT compute timestamps option and Extra Whitespace handling for SPE (#10875)

* add token duration

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* revert rnnt change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add remove_extra_whitespaces arg to spe tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add token duration retrieval

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add ignore_extra_whitespace to spe

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add compute_timestamp support for tdt

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix config field name

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add refinement for tdt timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add segments timestamp support and  refinement for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify tests for ctc decoding timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add rnnt timestamp tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* updated doc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* fix of unicode char

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix rnnt_decoding test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* workaround for tesst tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* modify segments formation

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify segments for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in ctc refinement

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* reverse offset change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* warning mode=once

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* make ignore_extrawhitespaces false

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* adjust changes to the tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify prompt_formatter tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

---------

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Basic online dynamic FP8 quantization with vLLM (#10904)

* Basic online dynamic quantization with vLLM

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Apply isort and black reformatting

Signed-off-by: janekl <janekl@users.noreply.github.com>

* vllm 0.6.3 updates

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Pass quantization param in deploy_vllm_triton.py script

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>

* ci: Improve VM maintenance (#10758)

* ci: Improve VM maintenance

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* rename stuff

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* title

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* use team

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* run on failure too

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* yrdy

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* test

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* neva update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add comment for vision transpose

* update megatron_init.py inside lightning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix PP

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix test

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* try fix test

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* try fix test

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix megatron megatron_init.py dp

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Update lightning megatron_init.py dp

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* make it possible to update pre_preprocess and post_process for llm, required in vlm

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fixes for neva to run with PP

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add mcore vit support, and checkpoint conversion

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix checkpoint loading for epp

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename llama to mllama folder name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* added datamodule for llava-next

* modified state dict transform

* neva model changes to support  llava-next

* remove accidentally checked in files

Signed-off-by: Yashaswi Karnati <ykarnati@login-eos01.eos.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* remove unused imports

* added io_init to not save task_encoder and image_processor

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* added scripts for pretrain and finetune

Signed-off-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com>

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* generation example

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* small change in llava next example

* llava next end-end train

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* finetune changes

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* finetune debug changes

* update dropout to 0

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* added example generation script

* added doc strings, formating, remove debug statemens and unsued imports

* remove example scripts

* fix attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* remove disable_vision_padding since we now have a fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update init for mllama

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Address comments

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix copyright title

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* multiple fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* bug fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix code scan

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix for SP

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update vision code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* revert attention bias changes until latest MLM code got merged

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Turn off system message check, as it's "" now

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Update layer spec and add siglip support

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update pretrain script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Fix scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* add neva training recipes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix mllama mock ds

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix recipe

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix pp

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* scripts update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* scripts update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update config api

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* few updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update 70b

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* hide examples for pr

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix few issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add docstring layer spec

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add docstring to vit config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix copyright

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: Yashaswi Karnati <ykarnati@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
Signed-off-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Bobby Chen <bobchen@nvidia.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: meatybobby <meatybobby@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Yashaswi Karnati <ykarnati@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
yashaswikarnati added a commit that referenced this pull request Nov 21, 2024
* add initial code for llama vlm

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some restructure

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add mock data placeholder

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix some importing

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add language component for vlm llama

* update code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* now match num of params

* update language part and fix vision part

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* model can now init

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor update for llama32 text config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* make checkpoint loading work

* missing import

* match vision part tensor shapes with configs

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* solve some fwd issues and mismatch issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add vision import

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update importer to convert both text and image weights

* importer typos and reduce clutter

* fix import qkv

* some fixes for LLM

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add embedding

* some updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* enable loading only text or only vision

* add example script

* TP fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update

* upload examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generate

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to newer version

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* upload for sharing

* update to new pyt ckpt

* xattn_caches matches (except small differences due to TE RMSNorm)

* cleanup

* embeddings match

* match precision of weights

* update sharded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* change xattn layer num to 3 7 11 etc

* upload llama generation

* minor fix

* fix dummy layer input format

* fix vision qkv order

* fix shareded state dict

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix vision precision

* fix rope

* match cross attn layer

* remove nrep

* Remove cross attention in ImageTransformerLayer and fix _gate_ffn

* PP draft

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix intermediate tensor

* temp save for pp2 is working

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix pp issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* merge

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* small update to pretrain script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* added energon dataloader for neva training (#10451)

* added energon dataloader for neva training

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* specify global batch size to support grad accumulation

* adding neva pretrain example

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* change pretraine example to handle new ckpt reloading

* fixed code quality warnings and unused imports

Signed-off-by: ykarnati <ykarnati@nvidia.com>

* minor changes for PR comments

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* refactor conversation template config

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>

* remove optional import

---------

Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com>
(cherry picked from commit 7354740)

* llama energon dataloader

* have tokenizer for base task encoder class

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Add simple inference

* evian3 update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add aspect ratio in model

* support energon dataloader

* some pp update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv merging

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix get_key_value_tensors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename files

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to HF style position embedding

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix energon dataloader and support batching

* update forward args

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up and move to aspect_ratio_ids

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename back to language.py

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix loss function

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update and fix energon

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add hf import

* Fix type

* Change config

* update energon pretrain

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up

* clean up

* reformat

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update inference files for new code

* update to instruct

* update to instruct

* update few names

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer embedding.weight

* few fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add hf script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix kv import

* remove interleaved

* fixes and updates

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* lora fixes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* some code clean ups

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update training scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* refactors

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add LoRA finetuning

* fixes and nemo update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix importer registering issue by adding 11B and 90B configs

* update `decoder_seq_len`

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* science vqa script

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* clean up script name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix ckpt save serialization issue

* fix predefined config classes

* add num_chunks in input

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix format

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update finetuning scripts for PEFT

* add 11b recipe (need #10645 to test)

* fix mask generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* minor fix code style

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Support no image inference

* add llama svqa eval

* fix masking

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix generation

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* add 90b recipe and revise 11b recipe

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* clean up typing

* add option to disable vision padding

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* base model finetuning (does not work yet)

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fixed default conversation template config for MLLama

* Update svqa

* add multinode

* bot happy

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Perf improvements. Mainly from XAttn mask calculation (#10901)

* Perf improvements. Mainly from XAttn mask calculation

* Apply isort and black reformatting

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>

---------

Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>

* fix existing issues

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix scripts

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix lora

* few fixes for non image support

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update masking gen

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix data sampler and loading issue

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Add vlm generation

* Apply isort and black reformatting

Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* generation update

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update lazy dataset

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* hide vlm examples

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Revert "Add vlm generation"

This reverts commit 4711c75

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix VisionEncoder multi-batch bug

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update megatron_init.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

* llm.generate fixes (#10983)

* fix context path, disable optimizer init, add tp

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* format

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* address comments, require user to provide trainer

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fix

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* minor fixes

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* use __dict__ in check (#11012)

* check is_hf_model in leaf module

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* disable getattr alternative path

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo;

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* LoRA support for HF::AutoModelForCausalLM (#10982)

* add LinearAdapter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add hf lora example

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove unused imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* subclass mixin

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove stale imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix scale

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* regex selector for peft

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move lora

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* hf_auto_model_for_causal_lm finetune recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Change default for always_save_context to True (#11014)

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>

* Add a build option to load_context (#10713)

* Add a build option to load_context

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Adding test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Trying to fix failing CPU test

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cherry-pick fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix pip install (#11026)

* Move AutoTokenizer inline

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move einops to common requirements

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move AutoTokenizer import to top-level again in fine_tuning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move megatron init inside nemo.lightning

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Make megatron_lazy_init_context work when transformer-engine is not installed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Only import get_nmt_tokenizer when needed

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

---------

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>

* [WIP] Add docs for NEST SSL (#10804)

* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc and fix missing param

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Change dist ckpt defaults (#10913)

* Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* fix ssm tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Make note that ckpt_async_save is disabled for SSMs

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for SSMs with fix

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Disable async ckpt in the peft test as it is a known bug, add note.

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix failing unit tests

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Ashors/peft async ckpt (#11010)

* [WIP] prototype for supporting async checkpointing with peft

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Enable async ckpt for the peft test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

* Fix peft setup test

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>

---------

Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* Akoumparouli/mixtral recipe fix r2.0.0 (#10994)

* Mixtral TP8 EP1

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Fix _strategy_lib tests (#11033)

* fix world size and don't mock

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* cleanup global state

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* check app state instead

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* fix syntax nemo logger test

Signed-off-by: Maanu Grover <maanug@nvidia.com>

---------

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016)

* Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)"

This reverts commit b5798de.

* make megatron sampler return the total number of batches in the dataset

Signed-off-by: ashors1 <ashors@nvidia.com>

---------

Signed-off-by: ashors1 <ashors@nvidia.com>

* PTQ example for NeMo 2.0 (#10642)

* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix export

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* code review suggestions

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* remove unused import

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* applied code review

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* TDT compute timestamps option and Extra Whitespace handling for SPE (#10875)

* add token duration

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* revert rnnt change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add remove_extra_whitespaces arg to spe tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add token duration retrieval

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add ignore_extra_whitespace to spe

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add compute_timestamp support for tdt

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix config field name

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add refinement for tdt timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add segments timestamp support and  refinement for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify tests for ctc decoding timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add rnnt timestamp tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* updated doc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* fix of unicode char

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix rnnt_decoding test

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* workaround for tesst tokenizer

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* modify segments formation

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify segments for ctc

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* fix in ctc refinement

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* reverse offset change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* warning mode=once

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* make ignore_extrawhitespaces false

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* minor changes

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* adjust changes to the tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify prompt_formatter tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

---------

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Basic online dynamic FP8 quantization with vLLM (#10904)

* Basic online dynamic quantization with vLLM

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Apply isort and black reformatting

Signed-off-by: janekl <janekl@users.noreply.github.com>

* vllm 0.6.3 updates

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Pass quantization param in deploy_vllm_triton.py script

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>

* ci: Improve VM maintenance (#10758)

* ci: Improve VM maintenance

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* rename stuff

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* title

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* use team

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* run on failure too

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* yrdy

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* test

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* fix

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Add comment for vision transpose

* update megatron_init.py inside lightning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* rename llama to mllama folder name

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update to attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* update dropout to 0

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix attention bias

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* remove disable_vision_padding since we now have a fix

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Update init for mllama

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* Address comments

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

* fix copyright title

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix code scan

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update vision code

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* revert attention bias changes until latest MLM code got merged

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* fix warning

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Turn off system message check, as it's "" now

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Rolllback megatron_parallel.py

Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: parthmannan <parthmannan@users.noreply.github.com>
Signed-off-by: meatybobby <meatybobby@users.noreply.github.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Bobby Chen <bobchen@nvidia.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <ykarnati@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com>
Co-authored-by: parthmannan <parthmannan@users.noreply.github.com>
Co-authored-by: meatybobby <meatybobby@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants