Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual pipeline parallel support for LoRA in NLPAdapterModelMixin #11128

Merged
merged 5 commits into from
Nov 1, 2024

Conversation

vysarge
Copy link
Collaborator

@vysarge vysarge commented Nov 1, 2024

What does this PR do ?

Corrects handling of model.virtual_pipeline_model_parallel_size > 1 alongside model.peft.peft_scheme=lora; before this fix adapters are attached only to the first virtual pipeline chunk.

Collection: nlp

Changelog

  • Replace parameter-based check for first pipeline stage with parallel_state.is_pipeline_first_stage() to account for virtual pipeline stage
  • Alter various methods (_get_all_keys, _check_and_add_peft_cfg, setup_optimizer_param_groups, set_tunable_base_params, tie_weights, get_peft_state_dict, load_state_dict) to apply LoRA changes to all layers rather than only those in the first chunk and to write and read checkpoints correctly when virtual pipeline parallel is used

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

…eline parallel + LoRA

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
@github-actions github-actions bot added the NLP label Nov 1, 2024
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Comment on lines +468 to +469
and "model_0" in state_dict
and len(state_dict["model_0"]) == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the logic of this code by comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
@vysarge vysarge added Run CICD and removed Run CICD labels Nov 1, 2024
@pablo-garay pablo-garay enabled auto-merge (squash) November 1, 2024 16:51
@pablo-garay pablo-garay merged commit 2c42fc3 into NVIDIA:main Nov 1, 2024
159 of 160 checks passed
ko3n1g pushed a commit that referenced this pull request Nov 1, 2024
…11128)

* Update NLPAdapterModelMixin to handle model structure for virtual pipeline parallel + LoRA

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up assert guard

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up ValueError raise

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: vysarge <vysarge@users.noreply.github.com>

* documentation

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
vysarge added a commit that referenced this pull request Nov 1, 2024
…11128) (#11135)

* Update NLPAdapterModelMixin to handle model structure for virtual pipeline parallel + LoRA



* Clean up assert guard



* Clean up ValueError raise



* Apply isort and black reformatting



* documentation



---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
…VIDIA#11128)

* Update NLPAdapterModelMixin to handle model structure for virtual pipeline parallel + LoRA

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up assert guard

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up ValueError raise

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: vysarge <vysarge@users.noreply.github.com>

* documentation

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
lilyw97 pushed a commit to lilyw97/NeMo that referenced this pull request Nov 13, 2024
…VIDIA#11128)

* Update NLPAdapterModelMixin to handle model structure for virtual pipeline parallel + LoRA

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up assert guard

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up ValueError raise

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: vysarge <vysarge@users.noreply.github.com>

* documentation

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
HuiyingLi pushed a commit to HuiyingLi/NeMo that referenced this pull request Nov 15, 2024
…VIDIA#11128)

* Update NLPAdapterModelMixin to handle model structure for virtual pipeline parallel + LoRA

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up assert guard

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Clean up ValueError raise

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: vysarge <vysarge@users.noreply.github.com>

* documentation

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants