fix ptuning residuals bug #6866

arendu · 2023-06-14T05:18:04Z

What does this PR do ?

fixes peft ptuning bug which was causing ptuning to use a residual connection
simplifies mixin use for all NLP PEFT models
ptuning includes a inference only table which is needed for FT

Collection: [NLP]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

…T inference Signed-off-by: arendu <adithya.r@gmail.com>

for more information, see https://pre-commit.ci

Signed-off-by: arendu <adithya.r@gmail.com>

…nto adithyare/lora_fix

Signed-off-by: arendu <adithya.r@gmail.com>

for more information, see https://pre-commit.ci

nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py

nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py

titu1994 · 2023-06-14T17:43:11Z

nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py


-    def forward(self, batch_size):
+    def _forward(self,):


Better to not make this private for subclasses, rename to forward_inner()

good point!

titu1994 · 2023-06-14T17:48:42Z

nemo/collections/nlp/modules/common/megatron/language_model.py

-                    virtual_embeddings = self.forward_single_enabled_adapter_(
-                        _bs, ptuning_adapter, adapter_name=AdapterName.PTUNING_ADAPTER, adapter_strategy=strategy,
-                    )
+                    virtual_embeddings = ptuning_adapter(_bs)


Each adapter type has its own strategy - by removing this you're hard coding the logic and side stepping the strategy. Its not necessary to do that is it?

Agreed, its not necessary, but just makes the readability and maintainability easier. Its just clearer what is happening to read the code and see a residual connection rather than follow up with where a default strategy is coming from.

oh one other limitation i think is that i can not pass additional args? for example for ptuning_adapter the forward now accepts an additional arg like used_cached_reps

After discussion, seems this is never going to be needed by nlp domain, so its fine to directly call adaper

titu1994 · 2023-06-14T17:53:38Z

nemo/collections/nlp/modules/common/prompt_encoder.py

@@ -70,7 +70,7 @@ def __init__(
        self.prompt_embeddings.weight.requires_grad = False

        # Set fixed indicies for forward pass
-        self.register_buffer('indices', torch.LongTensor(list(range(self.total_virtual_tokens))))
+        self.register_buffer("indices", torch.LongTensor(list(range(self.total_virtual_tokens))), persistent=False)


Probably a breaking change since older peft modules will have this but newer ones wont. Need to check

good catch! will check!

ok this will indeed break older p-tuning checkpoints. However, it is a nice "feature" because older checkpoints will need to be converted anyway to a new param naming format. In that conversion step (which need to be written) I will remove the indices.

The persistent part will be the issue, but if that's ok then good enough

Signed-off-by: arendu <adithya.r@gmail.com>

…nto adithyare/lora_fix

titu1994 · 2023-06-14T18:30:04Z

nemo/collections/nlp/modules/common/prompt_encoder.py

@@ -70,7 +70,7 @@ def __init__(
        self.prompt_embeddings.weight.requires_grad = False

        # Set fixed indicies for forward pass
-        self.register_buffer('indices', torch.LongTensor(list(range(self.total_virtual_tokens))))
+        self.register_buffer("indices", torch.LongTensor(list(range(self.total_virtual_tokens))), persistent=False)


The persistent part will be the issue, but if that's ok then good enough

titu1994 · 2023-06-14T19:21:42Z

nemo/collections/nlp/modules/common/megatron/language_model.py

-                    virtual_embeddings = self.forward_single_enabled_adapter_(
-                        _bs, ptuning_adapter, adapter_name=AdapterName.PTUNING_ADAPTER, adapter_strategy=strategy,
-                    )
+                    virtual_embeddings = ptuning_adapter(_bs)


After discussion, seems this is never going to be needed by nlp domain, so its fine to directly call adaper

fix for lora bug and makes ptuning w peft framework compatible with F…

cf33157

…T inference Signed-off-by: arendu <adithya.r@gmail.com>

github-actions bot added the NLP label Jun 14, 2023

pre-commit-ci bot and others added 4 commits June 14, 2023 05:19

[pre-commit.ci] auto fixes from pre-commit.com hooks

13adeb9

for more information, see https://pre-commit.ci

update

a34c056

Signed-off-by: arendu <adithya.r@gmail.com>

Merge branch 'adithyare/lora_fix' of https://github.com/NVIDIA/NeMo i…

7104ccb

…nto adithyare/lora_fix

simple forward call for adapters with residual

e20d6f8

Signed-off-by: arendu <adithya.r@gmail.com>

arendu requested a review from titu1994 June 14, 2023 05:23

arendu marked this pull request as ready for review June 14, 2023 05:24

[pre-commit.ci] auto fixes from pre-commit.com hooks

18064c8

for more information, see https://pre-commit.ci

arendu requested a review from Davood-M June 14, 2023 16:31

titu1994 reviewed Jun 14, 2023

View reviewed changes

arendu added 2 commits June 14, 2023 11:32

updates

8e2f79c

Signed-off-by: arendu <adithya.r@gmail.com>

Merge branch 'adithyare/lora_fix' of https://github.com/NVIDIA/NeMo i…

f1a384b

…nto adithyare/lora_fix

titu1994 approved these changes Jun 14, 2023

View reviewed changes

arendu changed the title ~~fix lora residuals bug~~ fix ptuning residuals bug Jun 14, 2023

arendu added 4 commits June 14, 2023 19:06

Merge branch 'main' into adithyare/lora_fix

8905da1

Merge branch 'main' into adithyare/lora_fix

ba629bd

Merge branch 'main' into adithyare/lora_fix

ba9d386

Merge branch 'main' into adithyare/lora_fix

9a94b1e

arendu merged commit a8609ab into main Jun 22, 2023

arendu deleted the adithyare/lora_fix branch June 22, 2023 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix ptuning residuals bug #6866

fix ptuning residuals bug #6866

arendu commented Jun 14, 2023 •

edited

Loading

titu1994 Jun 14, 2023

arendu Jun 14, 2023

titu1994 Jun 14, 2023

arendu Jun 14, 2023

arendu Jun 14, 2023

titu1994 Jun 14, 2023

titu1994 Jun 14, 2023

arendu Jun 14, 2023

arendu Jun 14, 2023

titu1994 Jun 14, 2023

titu1994 Jun 14, 2023

titu1994 Jun 14, 2023

fix ptuning residuals bug #6866

fix ptuning residuals bug #6866

Conversation

arendu commented Jun 14, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arendu commented Jun 14, 2023 •

edited

Loading