Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ENH FIX Allow "all-linear" to target custom models (#2267)
Description When the option to specify target_modules="all-linear" was introduced in PEFT (#1295), the restriction was added to only allow it for instances of PreTrainedModel. This was because we want to exclude the output layer from being targeted, even if it is a linear layer, and we can't determine this layer well except by convention. This PR lifts the restriction to PreTrainedModels. Thus, users can now target other models like diffusers models or custom models. The caveat is to use this "at your own risk", since all linear layers will be targeted, whether they be output layers or not. Bugfix While working on this, I found a potential bug. The logic for updating target_modules was that only the last part of the linear module's name was used. So e.g. if the module was named "foo.bar.baz", then "baz" was added to target_modules. This will lead to problems if there is another "baz" module that is not a linear layer. This bug was fixed by adding the full name ("foo.bar.baz" in this example) to the updated target_modules. This can potentially lead to big target_modules with a lot of almost repititions, but it's worth it to avoid targeting the wrong module. It is not clear to me why only the last part was added. The PR that added this to PEFT copied that part from here: https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L248 but it's not clear why that repo did it that way. Maybe it was just to keep the set size smaller. The bug was uncovered by the unet test that is already present. Still, I extended this test, as well as another one, to better cover this potential issue, by ensuring that the number of target layers is as expected. Backwards compatibility Technically, this change is breaking backwards compatibility. To go back to the previous example, let's say we have a module that is called "conv.baz" and that is a Conv2d layer. With the old behavior, since "baz" is added to the target_modules, we would now also target this Conv2d layer, which is supported by LoRA. After merging this PR, the Conv2d layer would no longer be targeted. I'd argue this is the correct behavior and thus worth changing. Also, note that since we override target_modules, this is reflected in the adapter_config.json. Therefore, if a user loads an adapter that had this "baz" target, it will still work as it did previously.
- Loading branch information