Fixed base model class name extraction from PeftModels #27162

kkteru · 2023-10-30T22:28:38Z

What does this PR do?

Fixes #27161

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@pacman100, @muellerzr

amyeroberts · 2023-10-31T09:31:07Z

cc @younesbelkada

younesbelkada

Thanks a lot!
I tried to reproduce and it is indeed a bug - however your fix does not properly take care of the case where users do DDP + PEFT. I propose to create first a dummy variable unwrapped_model that simply unwraps the model in case it is DDP or FSDP then perform the checks you suggested. What do you think?

younesbelkada · 2023-10-31T13:17:59Z

src/transformers/trainer.py

@@ -2687,7 +2687,7 @@ def compute_loss(self, model, inputs, return_outputs=False):

        if labels is not None:
            if is_peft_available() and isinstance(model, PeftModel):


Suggested change

if is_peft_available() and isinstance(model, PeftModel):

unwrapped_model = unwrap_model(model)

if is_peft_available() and isinstance(unwrapped_model, PeftModel):

model_name = unwrapped_model.base_model.model._get_name()

else:

model_name = unwrapped_model._get_name()

if model_name in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES.values():

loss = self.label_smoother(outputs, labels, shift_labels=True)

else:

loss = self.label_smoother(outputs, labels)

del unwrapped_model

Don't accept this suggestion otherwise it will create a weird diff but I did this so that you can see what I meant

HuggingFaceDocBuilderDev · 2023-11-02T16:54:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

younesbelkada

Thanks for your great work on this!

amyeroberts

Thanks for fixing this!

Just a small question - once clarified I think we're good to merge :)

amyeroberts · 2023-11-02T17:20:49Z

src/transformers/trainer.py

-                model_name = unwrap_model(model.base_model)._get_name()
+            unwrapped_model = unwrap_model(model)
+            if is_peft_available() and isinstance(unwrapped_model, PeftModel):
+                model_name = unwrapped_model.base_model.model._get_name()


For my own understanding, was unwrap_model(model.base_model)._get_name() ever working?

I'm asking to understand whether we would still need the case
model_name = unwrapped_model.base_model._get_name()

was unwrap_model(model.base_model)._get_name() ever working?

If model is a DistrbutedDataParallel or a FSDP wrapped module model.base_model would fail (you need to unwrap model first) + unwrap_model(model.base_model)._get_name() would also fail because the model is stored in model.base_model.model. Let me know if you want more clarifications

OK, but what if the model isn't distributed? Is it possible for those models to be hitting this logic branch?

If the model isn't distributed and it is a PeftModel and w/ non-zero label-smoothing, unwrap_model(model.base_model)._get_name() would not fail but return the wrong model_name due to the reason mentioned in #27161.

IMO, the change @younesbelkada suggested fixes a different bug that I did not explicitly mention in the original issue, i.e., PeftModel with DDP/FSDP + label smoothing would throw an error. In other words, without the DDP/FSDP the fix of just replacing model.base_model with model.base_model.model would have worked.

I'm asking to understand whether we would still need the case model_name = unwrapped_model.base_model._get_name()

I think in peft, the abstraction around the true base model is always PeftModel.base_model.model in all configurations from what I can tell (refer here and here). So this case may not be needed anymore.

That sounds like a good idea. Let me try to do that, if that is okay. I see two potential changes in the neft activate and deactivate functions. I will try to do a more thorough scan and push another commit for review.

@younesbelkada Great work on integrating NEFT so quick, thank you so much!

sounds great, thank you @kkteru !

Just pushed the changes. I don't think there is any other place where this needs to be changed. One place that came close was this prefix definition when loading the adapter state_dict, but I think that was correctly declared.

If model is a DistrbutedDataParallel or a FSDP wrapped module model.base_model would fail (you need to unwrap model first) + unwrap_model(model.base_model)._get_name() would also fail because the model is stored in model.base_model.model. Let me know if you want more clarifications

I actually noticed similar potential issue in the sft_trainer/neft support of trl package here and here. Happy to push a PR there or leave it to you to clean up after.

ah yes, if you could submit PRs on TRL that would be great as well ! Thanks @kkteru !

…odel abstractions

younesbelkada

Thanks again for your great work!

amyeroberts

Thanks for fixing and iterating!

…7162) * Fixed base model class name extraction from PeftModels * Changes to first unwrap the model then extract the base model name * Changed base_model to base_model.model to stay consistent with peft model abstractions

younesbelkada reviewed Oct 31, 2023

View reviewed changes

younesbelkada approved these changes Nov 2, 2023

View reviewed changes

younesbelkada requested a review from amyeroberts November 2, 2023 16:55

amyeroberts reviewed Nov 2, 2023

View reviewed changes

kkteru added 3 commits November 2, 2023 14:48

Fixed base model class name extraction from PeftModels

9776c3e

Changes to first unwrap the model then extract the base model name

6bec5f5

Changed base_model to base_model.model to stay consistent with peft m…

0ef3a1c

…odel abstractions

younesbelkada approved these changes Nov 2, 2023

View reviewed changes

amyeroberts approved these changes Nov 2, 2023

View reviewed changes

amyeroberts merged commit 552ff24 into huggingface:main Nov 2, 2023
3 checks passed

kkteru mentioned this pull request Nov 2, 2023

Fix unwrapping peft models huggingface/trl#948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed base model class name extraction from PeftModels #27162

Fixed base model class name extraction from PeftModels #27162

kkteru commented Oct 30, 2023 •

edited

Loading

amyeroberts commented Oct 31, 2023

younesbelkada left a comment •

edited

Loading

younesbelkada Oct 31, 2023

younesbelkada Oct 31, 2023

HuggingFaceDocBuilderDev commented Nov 2, 2023

younesbelkada left a comment

amyeroberts left a comment

amyeroberts Nov 2, 2023

younesbelkada Nov 2, 2023

amyeroberts Nov 2, 2023

kkteru Nov 2, 2023 •

edited

Loading

kkteru Nov 2, 2023 •

edited

Loading

kkteru Nov 2, 2023

younesbelkada Nov 2, 2023

kkteru Nov 2, 2023

kkteru Nov 2, 2023 •

edited

Loading

younesbelkada Nov 2, 2023

younesbelkada left a comment

amyeroberts left a comment

		@@ -2687,7 +2687,7 @@ def compute_loss(self, model, inputs, return_outputs=False):

		if labels is not None:
		if is_peft_available() and isinstance(model, PeftModel):

-            if is_peft_available() and isinstance(model, PeftModel):
+            unwrapped_model = unwrap_model(model)
+            if is_peft_available() and isinstance(unwrapped_model, PeftModel):
+                model_name = unwrapped_model.base_model.model._get_name()
+            else:
+                model_name = unwrapped_model._get_name()
+            if model_name in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES.values():
+                loss = self.label_smoother(outputs, labels, shift_labels=True)
+            else:
+                loss = self.label_smoother(outputs, labels)
+            del unwrapped_model

Fixed base model class name extraction from PeftModels #27162

Fixed base model class name extraction from PeftModels #27162

Conversation

kkteru commented Oct 30, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

amyeroberts commented Oct 31, 2023

younesbelkada left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 2, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkteru Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

kkteru Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkteru Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

kkteru commented Oct 30, 2023 •

edited

Loading

younesbelkada left a comment •

edited

Loading

kkteru Nov 2, 2023 •

edited

Loading

kkteru Nov 2, 2023 •

edited

Loading

kkteru Nov 2, 2023 •

edited

Loading