Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lora ckpt in HF format for NeMo AutoModel #11712

Merged
merged 27 commits into from
Jan 10, 2025
Merged

Conversation

oyilmaz-nvidia
Copy link
Collaborator

What does this PR do ?

Adds support to save Lora ckpt in HF format for NeMo automodel.

oyilmaz-nvidia and others added 9 commits December 23, 2024 13:58
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
examples/llm/peft/hf.py Fixed Show fixed Hide fixed
examples/llm/peft/hf.py Fixed Show fixed Hide fixed
nemo/lightning/pytorch/strategies/utils.py Fixed Show fixed Hide fixed
@@ -117,7 +117,15 @@ def ckpt_to_dir(filepath: Union[str, Path]) -> Path:


def create_checkpoint_io(wrapping_ckpt_io=None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test to make a checkpoint with NeMo and restore it in huggingface? We have tests now for LLM & VLM.

Also, right now checkpoint saving is disabled in the tests, can you turn it on (minor change in the test command)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabled the ckpt savings in those tests. I'll need to address the restore in a separate PR using AutoResume right after this PR.

oyilmaz-nvidia and others added 2 commits January 6, 2025 14:29
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
@github-actions github-actions bot added the CI label Jan 6, 2025
@oyilmaz-nvidia oyilmaz-nvidia marked this pull request as ready for review January 6, 2025 22:35
ko3n1g
ko3n1g previously approved these changes Jan 7, 2025
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
@oyilmaz-nvidia oyilmaz-nvidia enabled auto-merge (squash) January 8, 2025 08:38
oyilmaz-nvidia and others added 5 commits January 8, 2025 11:50
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Updating peft test name

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>

checkpoint_io = HuggingFaceCheckpointIO(lora=kwargs["lora"])
else:
from nemo.lightning.io.pl import MegatronCheckpointIO

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
nemo.lightning.io.pl
begins an import cycle.
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
oyilmaz-nvidia and others added 3 commits January 9, 2025 10:15
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
changing the hf vlm test name

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Copy link
Contributor

github-actions bot commented Jan 9, 2025

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module hf
examples/llm/peft/hf.py:28:0: C0301: Line too long (184/119) (line-too-long)
examples/llm/peft/hf.py:24:0: C0116: Missing function or method docstring (missing-function-docstring)
examples/llm/peft/hf.py:63:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.export.vllm_hf_exporter
nemo/export/vllm_hf_exporter.py:62:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/vllm_hf_exporter.py:104:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.lightning.io.pl
nemo/lightning/io/pl.py:82:0: C0301: Line too long (130/119) (line-too-long)
nemo/lightning/io/pl.py:58:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/io/pl.py:64:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/io/pl.py:73:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/io/pl.py:300:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/io/pl.py:305:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.lightning.pytorch.strategies.utils
nemo/lightning/pytorch/strategies/utils.py:40:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/pytorch/strategies/utils.py:49:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:57:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:69:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:85:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:120:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:142:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:197:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.73/10

Mitigation guide:

  • Add sensible and useful docstrings to functions and methods
  • For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
  • To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

1 similar comment
Copy link
Contributor

github-actions bot commented Jan 9, 2025

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module hf
examples/llm/peft/hf.py:28:0: C0301: Line too long (184/119) (line-too-long)
examples/llm/peft/hf.py:24:0: C0116: Missing function or method docstring (missing-function-docstring)
examples/llm/peft/hf.py:63:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.export.vllm_hf_exporter
nemo/export/vllm_hf_exporter.py:62:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/vllm_hf_exporter.py:104:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.lightning.io.pl
nemo/lightning/io/pl.py:82:0: C0301: Line too long (130/119) (line-too-long)
nemo/lightning/io/pl.py:58:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/io/pl.py:64:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/io/pl.py:73:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/io/pl.py:300:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/io/pl.py:305:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.lightning.pytorch.strategies.utils
nemo/lightning/pytorch/strategies/utils.py:40:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/pytorch/strategies/utils.py:49:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:57:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:69:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:85:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:120:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:142:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/pytorch/strategies/utils.py:197:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.73/10

Mitigation guide:

  • Add sensible and useful docstrings to functions and methods
  • For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
  • To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

Copy link
Contributor

[🤖]: Hi @oyilmaz-nvidia 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

@oyilmaz-nvidia oyilmaz-nvidia merged commit 9799051 into main Jan 10, 2025
201 of 204 checks passed
@oyilmaz-nvidia oyilmaz-nvidia deleted the onur/auto-model-peft-ckpt branch January 10, 2025 16:51
@@ -3675,7 +3676,7 @@ jobs:
with:
RUNNER: self-hosted-azure
SCRIPT: |
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/peft.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2 --strategy ddp --disable-ckpt
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/peft_hf.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2 --strategy ddp --disable-ckpt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oyilmaz-nvidia thus still has --disable-ckpt

BoxiangW pushed a commit that referenced this pull request Jan 14, 2025
* Save lora ckpt in safetensor and a config

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* remove hf variable from peft

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* vllm with automodel peft working

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* revert changes

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* update examples

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* removed unused import

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* enable ckpt saving

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>

* remove unused import

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* fix minor bug

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
abhinavg4 pushed a commit that referenced this pull request Jan 30, 2025
* Save lora ckpt in safetensor and a config

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* remove hf variable from peft

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* vllm with automodel peft working

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* revert changes

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* update examples

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* removed unused import

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* enable ckpt saving

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>

* remove unused import

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* fix minor bug

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Abhinav Garg <abhgarg@nvidia.com>
youngeunkwon0405 pushed a commit to youngeunkwon0405/NeMo that referenced this pull request Feb 10, 2025
* Save lora ckpt in safetensor and a config

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* remove hf variable from peft

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* vllm with automodel peft working

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* revert changes

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* update examples

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* removed unused import

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* enable ckpt saving

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>

* remove unused import

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* fix minor bug

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants