Add an option 'ALL' to include all linear layers as target modules #1295

SumanthRH · 2023-12-24T15:58:19Z

What does this PR do?

Adds a shorthand 'ALL' to include all linear/Conv1D layers as target modules. Code adapted from the QLoRA repository.

Why is this needed?

Currently, when target_modules is None, then PEFT will choose specific layers for different models based on the original paper for that adapter. For example, this is the query and value layers in the case of LoRA. However, this is far from the best configuration for LoRA. More recently, in QLoRA, it was found that adding LoRA to ALL the layers is important to match full fine-tuning performance. QLoRA-style training is very popular so I believe this feature will be helpful.

Changes made

I've added a helper function in tuners_utils.py which will preprocess the target_modules argument specified by the user. I've also added some tests in test_tuners_utils.py

…nd updated documentation

younesbelkada

Thanks a lot for this addition! Great tests!
One thing here is that this API won't be generalized for all models as some architectures have different names for lm_head. I proposed a suggestion below.
I suggest to copy over the method get_keys_to_not_convert here in PEFT and use that method.
GPTNeoX uses embed_out for the name of the lm head: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L964 - you can use this model: https://huggingface.co/hf-internal-testing/tiny-random-GPTNeoXForCausalLM for testing

src/peft/tuners/tuners_utils.py

SumanthRH · 2023-12-26T04:12:22Z

@younesbelkada Do take another look!

Updates:

I've used some code from get_keys_to_not_convert so that different key values for the output layer are supported.
I added the current shorthand value "ALL" in constants.py - not sure if there's a better way to handle this.
Added GPTNeoX to tests with some refactoring. Should be good now!

HuggingFaceDocBuilderDev · 2023-12-26T09:32:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

Very nice work, thanks very much for working on this @SumanthRH !

BenjaminBossan

Thanks for this PR. I haven't done a detailed review, but have some comments:

How useful do you think the feature is. Do you have use cases where all linear layers have to be replaced? Note that we could already do something like target_modules = [name for name, module in model.named_modules() if isinstance(module, linear_classes)].
As a user, I could be confused by the "ALL" option, since it does not target all possible layers, only all linear layers. I assume you didn't want to call the option "Linear" because that could be ambiguous if there is a module with the name "Linear". But we could use something like "<Linear>" or so, since that would not be a legal name for a module, so it's unambiguous. WDYT?
IIUC, this method works by modifying the peft config and overriding the value of target_modules. I wonder if that is the best way. As a user, if I pass a certain config, I would generally assume that it stays as it is. Instead of overriding the config, would it be better to leave it and then, when we check if we should add the LoRA adapters to a certain layer, we change the logic to match the type instead of name.

src/peft/tuners/tuners_utils.py

SumanthRH · 2023-12-29T05:54:46Z

@BenjaminBossan thanks for the review!

How useful do you think the feature is. Do you have use cases where all linear layers have to be replaced? Note that we could already do something like target_modules = [name for name, module in model.named_modules() if isinstance(module, linear_classes)].

I think this is a popular usecase, because QLoRA-style training is popular and effective. This configuration, as of 2023 atleast, is the best one for LoRA models to match full finetuning. From the QLoRA paper:
"When using the standard practice of applying LoRA to query and value attention projection matrices...we are not able to replicate full finetuning performance for large base models......we find that the most critical LoRA hyperparameter is how many LoRA adapters are used in total and that LoRA on all linear transformer block layers are required to match full finetuning performance."

As such, there are still some experiments to be made to see if we can push this further (for example, adding LoRA to the last output layer as well), but I feel PEFT should support this since it's the best LoRA setting right now.

In terms of modifications, it isn't as simple as the one-liner you provided because 1) the user needs to provide all linear classes (with bnb and Conv1D layers if needed), 2) the user should also exclude the final output layer if it exists. This is basically what the current function does right now (with one extra step to get the base name of the module).

As a user, I could be confused by the "ALL" option, since it does not target all possible layers, only all linear layers. I assume you didn't want to call the option "Linear" because that could be ambiguous if there is a module with the name "Linear". But we could use something like "<Linear>" or so, since that would not be a legal name for a module, so it's unambiguous. WDYT?

Right as always :) I was only focused on language models with Linear layers, in which case "ALL" is a good name. I'm not a fan of mixed case or having an uncommon symbol like <, so I've changed it to "all_linear" now, hope that works!

IIUC, this method works by modifying the peft config and overriding the value of target_modules. I wonder if that is the best way. As a user, if I pass a certain config, I would generally assume that it stays as it is. Instead of overriding the config, would it be better to leave it and then, when we check if we should add the LoRA adapters to a certain layer, we change the logic to match the type instead of name.

Hmm this is a good point. If the user passed in "all_linear", it will get secretly rewritten with a new list of target modules, and, for example, when they save the config/ model with model.save_pretrained, they might see an unexpected entry. Not sure how problematic it is though, since this change mirrors what _prepare_adapter_config which updates the peft_config directly if target_modules is None.

(Also, this might end up being a feature for others to replicate later on: leaving the target_modules as None or all_linear is vague and updating target_modules in place can help someone else trying to replicate results by getting the exact list of target_modules used - for example, our default configs in TRANSFORMERS_MODELS_TO_XXX_TARGET_MODULES_MAPPING, or the list of modules matched in _maybe_include_all_linear_layers might be updated in the future, but the target_modules used for an old run will be saved in the config.)

BenjaminBossan

I think this is a popular usecase, because QLoRA-style training is popular and effective. This configuration, as of 2023 atleast, is the best one for LoRA models to match full finetuning.

Makes sense.

In terms of modifications, it isn't as simple as the one-liner you provided

Yes, true, the snippet would have to rather be:

linear_classes = ...
target_modules = [name for name, module in model.named_modules() if isinstance(module, linear_classes) and name != <NAME-HEAD>]

Still, I'm fine with adding this convenience function.

I'm not a fan of mixed case or having an uncommon symbol like <, so I've changed it to "all_linear" now, hope that works!

Not a fan of special chars either, but the issue is still that if a model has a module with the name "all_linear", this would be ambiguous, only special chars can prevent this. How about "all-linear"? Off the top of my head, I can't think of any potential conflicts here.

Regarding whether to modify the config or not:

this change mirrors what _prepare_adapter_config which updates the peft_config directly if target_modules is None

Yes, there is precedence, but I don't like the precedence either :)

Let's think this design through. If we modify the config as proposed in this PR, these points come to mind:

There is a bit of a surprise because the saved config is not the same as the one indicated by the user.
The saved config transparently shows what layers were adapted.
If we fix something in this logic (say, there is a bug with some layers matching incorrectly), a user who loads the config (with the explicit target_modules) will get different results from a user who uses a "fresh" config (with implicit target_modules). This can be bad (bug is not fixed) or good (reproducibility).
It works directly for other tuner methods like IA³ because we don't need to touch _create_and_replace, which is different for each tuner method.

In sum, there are pros and cons for both. I'm fine with the current approach (but let's document it).

src/peft/tuners/ia3/config.py

src/peft/tuners/lora/config.py

tests/test_tuners_utils.py

src/peft/tuners/tuners_utils.py

tests/test_tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan

I agree with your statements on testing and checking for PreTrainedModel. The PR looks pretty good. I found some minor issues still, but hopefully those can be fixed quickly. Thanks for your continued work.

PS: Optionally, having an entry in the docs (for instance here) could make this feature easier to discover.

src/peft/tuners/ia3/config.py

src/peft/tuners/tuners_utils.py

tests/test_tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

SumanthRH · 2024-01-03T17:23:48Z

@BenjaminBossan thanks for the detailed reviews! Hopefully this can be merged now!

SumanthRH · 2024-01-03T19:19:55Z

Following discussions in #1318, I've simplified the logic for bnb layers! Code is much cleaner now. The tests seem like overkill now, although I think the bnb tests can stay just to be sure.

BenjaminBossan

Thanks so much @SumanthRH, the PR looks pretty good, nothing to add from my side. Before merging, let's wait for @pacman100 (he's OOO this week), because of the considerable overlap with his PR.

The tests seem like overkill now, although I think the bnb tests can stay just to be sure.

They should stay for sure. This could help us in the future (e.g. if they decide that bnb layers no longer inherit from Linear, the tests would catch it).

pacman100

Thank you @SumanthRH for adding all-linear option to target the linear layers following QLoRA recipe, very useful! Also, comprehensive tests! 🚀

Description When the option to specify target_modules="all-linear" was introduced in PEFT (huggingface#1295), the restriction was added to only allow it for instances of PreTrainedModel. This was because we want to exclude the output layer from being targeted, even if it is a linear layer, and we can't determine this layer well except by convention. This PR lifts the restriction to PreTrainedModels. Thus, users can now target other models like diffusers models or custom models. The caveat is to use this "at your own risk", since all linear layers will be targeted, whether they be output layers or not. Bugfix While working on this, I found a potential bug. The logic for updating target_modules was that only the last part of the linear module's name was used. So e.g. if the module was named "foo.bar.baz", then "baz" was added to target_modules. This will lead to problems if there is another "baz" module that is not a linear layer. This bug was fixed by adding the full name ("foo.bar.baz" in this example) to the updated target_modules. This can potentially lead to big target_modules with a lot of almost repititions, but it's worth it to avoid targeting the wrong module. It is not clear to me why only the last part was added. The PR that added this to PEFT copied that part from here: https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L248 but it's not clear why that repo did it that way. Maybe it was just to keep the set size smaller. The bug was uncovered by the unet test that is already present. Still, I extended this test, as well as another one, to better cover this potential issue, by ensuring that the number of target layers is as expected. Backwards compatibility Technically, this change is breaking backwards compatibility. To go back to the previous example, let's say we have a module that is called "conv.baz" and that is a Conv2d layer. With the old behavior, since "baz" is added to the target_modules, we would now also target this Conv2d layer, which is supported by LoRA. After merging this PR, the Conv2d layer would no longer be targeted. I'd argue this is the correct behavior and thus worth changing. Also, note that since we override target_modules, this is reflected in the adapter_config.json. Therefore, if a user loads an adapter that had this "baz" target, it will still work as it did previously.

Description When the option to specify target_modules="all-linear" was introduced in PEFT (#1295), the restriction was added to only allow it for instances of PreTrainedModel. This was because we want to exclude the output layer from being targeted, even if it is a linear layer, and we can't determine this layer well except by convention. This PR lifts the restriction to PreTrainedModels. Thus, users can now target other models like diffusers models or custom models. The caveat is to use this "at your own risk", since all linear layers will be targeted, whether they be output layers or not. Bugfix While working on this, I found a potential bug. The logic for updating target_modules was that only the last part of the linear module's name was used. So e.g. if the module was named "foo.bar.baz", then "baz" was added to target_modules. This will lead to problems if there is another "baz" module that is not a linear layer. This bug was fixed by adding the full name ("foo.bar.baz" in this example) to the updated target_modules. This can potentially lead to big target_modules with a lot of almost repititions, but it's worth it to avoid targeting the wrong module. It is not clear to me why only the last part was added. The PR that added this to PEFT copied that part from here: https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L248 but it's not clear why that repo did it that way. Maybe it was just to keep the set size smaller. The bug was uncovered by the unet test that is already present. Still, I extended this test, as well as another one, to better cover this potential issue, by ensuring that the number of target layers is as expected. Backwards compatibility Technically, this change is breaking backwards compatibility. To go back to the previous example, let's say we have a module that is called "conv.baz" and that is a Conv2d layer. With the old behavior, since "baz" is added to the target_modules, we would now also target this Conv2d layer, which is supported by LoRA. After merging this PR, the Conv2d layer would no longer be targeted. I'd argue this is the correct behavior and thus worth changing. Also, note that since we override target_modules, this is reflected in the adapter_config.json. Therefore, if a user loads an adapter that had this "baz" target, it will still work as it did previously.

SumanthRH added 5 commits December 23, 2023 14:19

added helper function to get list of all linear layers; added tests a…

709422c

…nd updated documentation

added bnb tests

ff5d202

fixed issues with t5

7fa54fd

style issues

135ae4e

improved lora and ia3 docs

0813f9a

younesbelkada reviewed Dec 25, 2023

View reviewed changes

src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved

SumanthRH added 3 commits December 25, 2023 18:01

fixed code to work for any output embedding layer name

9ce4293

style changes

9328095

added a test for a base model without lm head

764b02b

SumanthRH requested a review from younesbelkada December 26, 2023 04:11

added comments

3e5a29b

younesbelkada approved these changes Dec 26, 2023

View reviewed changes

younesbelkada requested a review from pacman100 December 26, 2023 09:38

BenjaminBossan reviewed Dec 28, 2023

View reviewed changes

src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved

SumanthRH added 5 commits December 28, 2023 19:44

address review comments

d59a101

update tests

ebf36ac

update tests

0066fb6

minor simplification

4fc1034

changed argument to all_linear

8fcaaf2

SumanthRH requested a review from BenjaminBossan December 29, 2023 05:55

SumanthRH added 2 commits December 28, 2023 22:05

minor fix to configs

aeb4f96

minor edit

2483ce5

BenjaminBossan requested changes Jan 2, 2024

View reviewed changes

SumanthRH and others added 4 commits January 2, 2024 10:56

Apply suggestions from code review

bd5c12d

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

address review comments

3c5e9b2

added test for diffusion models

2e462b6

minor edits to configs

7770df9

spelling correction

eefce2f

SumanthRH requested a review from BenjaminBossan January 2, 2024 21:20

BenjaminBossan requested changes Jan 3, 2024

View reviewed changes

SumanthRH and others added 4 commits January 3, 2024 08:23

Update tests/test_tuners_utils.py

bbc1619

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update src/peft/tuners/tuners_utils.py

6d4d873

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update src/peft/tuners/tuners_utils.py

929b090

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

address review comments

bcc2342

BenjaminBossan mentioned this pull request Jan 3, 2024

QOL improvements and doc updates #1318

Merged

SumanthRH added 2 commits January 3, 2024 08:59

revert back to older decorator order

e40250c

style changes

f77f2a0

SumanthRH requested a review from BenjaminBossan January 3, 2024 17:21

simplify logic for bnb layers

0287c3f

BenjaminBossan approved these changes Jan 4, 2024

View reviewed changes

pacman100 approved these changes Jan 9, 2024

View reviewed changes

pacman100 merged commit cbd783b into huggingface:main Jan 9, 2024
14 checks passed

SumanthRH mentioned this pull request Jan 12, 2024

Improve documentation for the all-linear flag #1357

Merged

BenjaminBossan mentioned this pull request Dec 9, 2024

ENH FIX: Allow "all-linear" to target custom models #2267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option 'ALL' to include all linear layers as target modules #1295

Add an option 'ALL' to include all linear layers as target modules #1295

SumanthRH commented Dec 24, 2023

younesbelkada left a comment

SumanthRH commented Dec 26, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 26, 2023

younesbelkada left a comment

BenjaminBossan left a comment

SumanthRH commented Dec 29, 2023 •

edited

Loading

BenjaminBossan left a comment

BenjaminBossan left a comment •

edited

Loading

SumanthRH commented Jan 3, 2024 •

edited

Loading

SumanthRH commented Jan 3, 2024

BenjaminBossan left a comment

pacman100 left a comment

Add an option 'ALL' to include all linear layers as target modules #1295

Add an option 'ALL' to include all linear layers as target modules #1295

Conversation

SumanthRH commented Dec 24, 2023

What does this PR do?

Why is this needed?

Changes made

younesbelkada left a comment

Choose a reason for hiding this comment

SumanthRH commented Dec 26, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Dec 26, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

SumanthRH commented Dec 29, 2023 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment • edited Loading

Choose a reason for hiding this comment

SumanthRH commented Jan 3, 2024 • edited Loading

SumanthRH commented Jan 3, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

SumanthRH commented Dec 26, 2023 •

edited

Loading

SumanthRH commented Dec 29, 2023 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading

SumanthRH commented Jan 3, 2024 •

edited

Loading