Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option 'ALL' to include all linear layers as target modules #1295

Merged
merged 28 commits into from
Jan 9, 2024

Conversation

SumanthRH
Copy link
Contributor

What does this PR do?

Adds a shorthand 'ALL' to include all linear/Conv1D layers as target modules. Code adapted from the QLoRA repository.

Why is this needed?

Currently, when target_modules is None, then PEFT will choose specific layers for different models based on the original paper for that adapter. For example, this is the query and value layers in the case of LoRA. However, this is far from the best configuration for LoRA. More recently, in QLoRA, it was found that adding LoRA to ALL the layers is important to match full fine-tuning performance. QLoRA-style training is very popular so I believe this feature will be helpful.

Changes made

I've added a helper function in tuners_utils.py which will preprocess the target_modules argument specified by the user. I've also added some tests in test_tuners_utils.py

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this addition! Great tests!
One thing here is that this API won't be generalized for all models as some architectures have different names for lm_head. I proposed a suggestion below.
I suggest to copy over the method get_keys_to_not_convert here in PEFT and use that method.
GPTNeoX uses embed_out for the name of the lm head: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L964 - you can use this model: https://huggingface.co/hf-internal-testing/tiny-random-GPTNeoXForCausalLM for testing

src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
@SumanthRH
Copy link
Contributor Author

SumanthRH commented Dec 26, 2023

@younesbelkada Do take another look!

Updates:

  • I've used some code from get_keys_to_not_convert so that different key values for the output layer are supported.
  • I added the current shorthand value "ALL" in constants.py - not sure if there's a better way to handle this.
  • Added GPTNeoX to tests with some refactoring. Should be good now!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, thanks very much for working on this @SumanthRH !

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. I haven't done a detailed review, but have some comments:

  • How useful do you think the feature is. Do you have use cases where all linear layers have to be replaced? Note that we could already do something like target_modules = [name for name, module in model.named_modules() if isinstance(module, linear_classes)].
  • As a user, I could be confused by the "ALL" option, since it does not target all possible layers, only all linear layers. I assume you didn't want to call the option "Linear" because that could be ambiguous if there is a module with the name "Linear". But we could use something like "<Linear>" or so, since that would not be a legal name for a module, so it's unambiguous. WDYT?
  • IIUC, this method works by modifying the peft config and overriding the value of target_modules. I wonder if that is the best way. As a user, if I pass a certain config, I would generally assume that it stays as it is. Instead of overriding the config, would it be better to leave it and then, when we check if we should add the LoRA adapters to a certain layer, we change the logic to match the type instead of name.

src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
@SumanthRH
Copy link
Contributor Author

SumanthRH commented Dec 29, 2023

@BenjaminBossan thanks for the review!

  • How useful do you think the feature is. Do you have use cases where all linear layers have to be replaced? Note that we could already do something like target_modules = [name for name, module in model.named_modules() if isinstance(module, linear_classes)].

I think this is a popular usecase, because QLoRA-style training is popular and effective. This configuration, as of 2023 atleast, is the best one for LoRA models to match full finetuning. From the QLoRA paper:
"When using the standard practice of applying LoRA to query and value attention projection matrices...we are not able to replicate full finetuning performance for large base models......we find that the most critical LoRA hyperparameter is how many LoRA adapters are used in total and that LoRA on all linear transformer block layers are required to match full finetuning performance."

As such, there are still some experiments to be made to see if we can push this further (for example, adding LoRA to the last output layer as well), but I feel PEFT should support this since it's the best LoRA setting right now.

In terms of modifications, it isn't as simple as the one-liner you provided because 1) the user needs to provide all linear classes (with bnb and Conv1D layers if needed), 2) the user should also exclude the final output layer if it exists. This is basically what the current function does right now (with one extra step to get the base name of the module).

  • As a user, I could be confused by the "ALL" option, since it does not target all possible layers, only all linear layers. I assume you didn't want to call the option "Linear" because that could be ambiguous if there is a module with the name "Linear". But we could use something like "<Linear>" or so, since that would not be a legal name for a module, so it's unambiguous. WDYT?

Right as always :) I was only focused on language models with Linear layers, in which case "ALL" is a good name. I'm not a fan of mixed case or having an uncommon symbol like <, so I've changed it to "all_linear" now, hope that works!

  • IIUC, this method works by modifying the peft config and overriding the value of target_modules. I wonder if that is the best way. As a user, if I pass a certain config, I would generally assume that it stays as it is. Instead of overriding the config, would it be better to leave it and then, when we check if we should add the LoRA adapters to a certain layer, we change the logic to match the type instead of name.

Hmm this is a good point. If the user passed in "all_linear", it will get secretly rewritten with a new list of target modules, and, for example, when they save the config/ model with model.save_pretrained, they might see an unexpected entry. Not sure how problematic it is though, since this change mirrors what _prepare_adapter_config which updates the peft_config directly if target_modules is None.

(Also, this might end up being a feature for others to replicate later on: leaving the target_modules as None or all_linear is vague and updating target_modules in place can help someone else trying to replicate results by getting the exact list of target_modules used - for example, our default configs in TRANSFORMERS_MODELS_TO_XXX_TARGET_MODULES_MAPPING, or the list of modules matched in _maybe_include_all_linear_layers might be updated in the future, but the target_modules used for an old run will be saved in the config.)

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a popular usecase, because QLoRA-style training is popular and effective. This configuration, as of 2023 atleast, is the best one for LoRA models to match full finetuning.

Makes sense.

In terms of modifications, it isn't as simple as the one-liner you provided

Yes, true, the snippet would have to rather be:

linear_classes = ...
target_modules = [name for name, module in model.named_modules() if isinstance(module, linear_classes) and name != <NAME-HEAD>]

Still, I'm fine with adding this convenience function.

I'm not a fan of mixed case or having an uncommon symbol like <, so I've changed it to "all_linear" now, hope that works!

Not a fan of special chars either, but the issue is still that if a model has a module with the name "all_linear", this would be ambiguous, only special chars can prevent this. How about "all-linear"? Off the top of my head, I can't think of any potential conflicts here.

Regarding whether to modify the config or not:

this change mirrors what _prepare_adapter_config which updates the peft_config directly if target_modules is None

Yes, there is precedence, but I don't like the precedence either :)

Let's think this design through. If we modify the config as proposed in this PR, these points come to mind:

  1. There is a bit of a surprise because the saved config is not the same as the one indicated by the user.
  2. The saved config transparently shows what layers were adapted.
  3. If we fix something in this logic (say, there is a bug with some layers matching incorrectly), a user who loads the config (with the explicit target_modules) will get different results from a user who uses a "fresh" config (with implicit target_modules). This can be bad (bug is not fixed) or good (reproducibility).
  4. It works directly for other tuner methods like IA³ because we don't need to touch _create_and_replace, which is different for each tuner method.

In sum, there are pros and cons for both. I'm fine with the current approach (but let's document it).

src/peft/tuners/ia3/config.py Outdated Show resolved Hide resolved
src/peft/tuners/ia3/config.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
tests/test_tuners_utils.py Outdated Show resolved Hide resolved
tests/test_tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
tests/test_tuners_utils.py Outdated Show resolved Hide resolved
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your statements on testing and checking for PreTrainedModel. The PR looks pretty good. I found some minor issues still, but hopefully those can be fixed quickly. Thanks for your continued work.

PS: Optionally, having an entry in the docs (for instance here) could make this feature easier to discover.

src/peft/tuners/ia3/config.py Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
tests/test_tuners_utils.py Show resolved Hide resolved
tests/test_tuners_utils.py Outdated Show resolved Hide resolved
tests/test_tuners_utils.py Show resolved Hide resolved
SumanthRH and others added 4 commits January 3, 2024 08:23
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@SumanthRH
Copy link
Contributor Author

SumanthRH commented Jan 3, 2024

@BenjaminBossan thanks for the detailed reviews! Hopefully this can be merged now!

@SumanthRH
Copy link
Contributor Author

Following discussions in #1318, I've simplified the logic for bnb layers! Code is much cleaner now. The tests seem like overkill now, although I think the bnb tests can stay just to be sure.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @SumanthRH, the PR looks pretty good, nothing to add from my side. Before merging, let's wait for @pacman100 (he's OOO this week), because of the considerable overlap with his PR.

The tests seem like overkill now, although I think the bnb tests can stay just to be sure.

They should stay for sure. This could help us in the future (e.g. if they decide that bnb layers no longer inherit from Linear, the tests would catch it).

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @SumanthRH for adding all-linear option to target the linear layers following QLoRA recipe, very useful! Also, comprehensive tests! 🚀

@pacman100 pacman100 merged commit cbd783b into huggingface:main Jan 9, 2024
14 checks passed
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Dec 9, 2024
Description

When the option to specify target_modules="all-linear" was introduced in
PEFT (huggingface#1295), the restriction was added to only allow it for instances
of PreTrainedModel. This was because we want to exclude the output layer
from being targeted, even if it is a linear layer, and we can't
determine this layer well except by convention.

This PR lifts the restriction to PreTrainedModels. Thus, users can now
target other models like diffusers models or custom models. The caveat
is to use this "at your own risk", since all linear layers will be
targeted, whether they be output layers or not.

Bugfix

While working on this, I found a potential bug. The logic for updating
target_modules was that only the last part of the linear module's name
was used. So e.g. if the module was named "foo.bar.baz", then "baz" was
added to target_modules. This will lead to problems if there is another
"baz" module that is not a linear layer.

This bug was fixed by adding the full name ("foo.bar.baz" in this
example) to the updated target_modules. This can potentially lead to big
target_modules with a lot of almost repititions, but it's worth it to
avoid targeting the wrong module.

It is not clear to me why only the last part was added. The PR that
added this to PEFT copied that part from here:

https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L248

but it's not clear why that repo did it that way. Maybe it was just to
keep the set size smaller.

The bug was uncovered by the unet test that is already present. Still, I
extended this test, as well as another one, to better cover this
potential issue, by ensuring that the number of target layers is as
expected.

Backwards compatibility

Technically, this change is breaking backwards compatibility. To go back
to the previous example, let's say we have a module that is called
"conv.baz" and that is a Conv2d layer. With the old behavior, since
"baz" is added to the target_modules, we would now also target this
Conv2d layer, which is supported by LoRA. After merging this PR, the
Conv2d layer would no longer be targeted.

I'd argue this is the correct behavior and thus worth changing. Also,
note that since we override target_modules, this is reflected in the
adapter_config.json. Therefore, if a user loads an adapter that had this
"baz" target, it will still work as it did previously.
BenjaminBossan added a commit that referenced this pull request Dec 13, 2024
Description

When the option to specify target_modules="all-linear" was introduced in
PEFT (#1295), the restriction was added to only allow it for instances
of PreTrainedModel. This was because we want to exclude the output layer
from being targeted, even if it is a linear layer, and we can't
determine this layer well except by convention.

This PR lifts the restriction to PreTrainedModels. Thus, users can now
target other models like diffusers models or custom models. The caveat
is to use this "at your own risk", since all linear layers will be
targeted, whether they be output layers or not.

Bugfix

While working on this, I found a potential bug. The logic for updating
target_modules was that only the last part of the linear module's name
was used. So e.g. if the module was named "foo.bar.baz", then "baz" was
added to target_modules. This will lead to problems if there is another
"baz" module that is not a linear layer.

This bug was fixed by adding the full name ("foo.bar.baz" in this
example) to the updated target_modules. This can potentially lead to big
target_modules with a lot of almost repititions, but it's worth it to
avoid targeting the wrong module.

It is not clear to me why only the last part was added. The PR that
added this to PEFT copied that part from here:

https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L248

but it's not clear why that repo did it that way. Maybe it was just to
keep the set size smaller.

The bug was uncovered by the unet test that is already present. Still, I
extended this test, as well as another one, to better cover this
potential issue, by ensuring that the number of target layers is as
expected.

Backwards compatibility

Technically, this change is breaking backwards compatibility. To go back
to the previous example, let's say we have a module that is called
"conv.baz" and that is a Conv2d layer. With the old behavior, since
"baz" is added to the target_modules, we would now also target this
Conv2d layer, which is supported by LoRA. After merging this PR, the
Conv2d layer would no longer be targeted.

I'd argue this is the correct behavior and thus worth changing. Also,
note that since we override target_modules, this is reflected in the
adapter_config.json. Therefore, if a user loads an adapter that had this
"baz" target, it will still work as it did previously.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants