Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) #2840

jordiclive · 2023-04-22T12:29:53Z

This PR adds LoRA and prefix-tuning as modelling options (training and sampling code).

Both have shown strong performance and can outperform fine-tuning. They also can protect against the catastrophic forgetting problem which is important for chatbots. They keep the whole language model frozen so they can be distributed freely independent of the base language model.

They also allow much more memory-efficient training as there is no need for the optimizer states of the base model.

Benefits of LoRA

Can run only DS 2 with 30B.
Run 65B without intense CPU usage
Less overfitting, maintains performance on datasets.
Only need to share/push to hub the OS component (small file in Mbs)

— See Andrej Karpathy (OpenAI) comment
— See purported google leak

Implementation Details:

Explicitly set input_ids as a keyword argument in the sampling code to ensure proper functionality with PEFT generate.
Manually enable gradients for input to leverage gradient checkpointing effectively, as frozen embeddings would otherwise prevent gradient attachment.
Include saving code from pytorch_model.bin for special tokens. Although these tokens are randomly initialized, they must be stored and saved as an external module since PEFT parameters learn to utilize them. Making them trainable is an option, but it is unlikely to make a significant difference.
Integrate prefix-tuning, a powerful technique that involves stacking keys and values, by incorporating custom LLama modelling code. (Training speed lower than LoRA in my initial tests and performance similar/worse).

github-actions · 2023-04-22T12:31:54Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-04-22T12:55:29Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-04-22T13:12:25Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

smeyerhot · 2023-04-25T19:38:13Z

@jordiclive would this PR make it possible to load a PEFT model for inference in the chat?

jordiclive · 2023-04-26T22:03:17Z

@smeyerhot This code is currently just for model training and evaluation. But should be trivial to load it for inference, it uses the same HF generate method.

jordiclive · 2023-04-29T18:47:41Z

@andreaskoepf I am going to run a 30B LoRA model just on the sft datasets and will post the sampling report.

github-actions · 2023-05-06T10:14:26Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-05-06T10:30:37Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

model/model_training/configs/config.yaml

smeyerhot · 2023-05-06T17:06:04Z

model/model_training/models/peft_modeling.py

+    freeze_layer: bool = False
+    residual_dropout: float = 0
+    use_flash_attention: bool = False
+    adapter_save_path: str = "adapter_13B_new"


Does this only work on 13B model? Should it be hardcoded?

it's the output path for a directory with the adapter weights in them. I've changed to just "adapter"

@smeyerhot I've updated the configs.

change log dir names and model name for lora 13B in config

change output path for adapter

andreaskoepf

Super great to have this!

no message

1bf0eef

jordiclive force-pushed the peft_modeling branch from 2516dfa to 1bf0eef Compare May 6, 2023 10:11

jordiclive added 3 commits May 6, 2023 11:17

fix configs.

11a8971

no message

d7efdad

no message

51b8309

jordiclive changed the title ~~Add PEFT Modeling code.~~ Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) May 6, 2023

jordiclive marked this pull request as ready for review May 6, 2023 10:36

jordiclive requested review from theblackcat102, sanagno, dvruette, andreaskoepf and yk as code owners May 6, 2023 10:36

no message

2becaf4

smeyerhot reviewed May 6, 2023

View reviewed changes

model/model_training/configs/config.yaml Show resolved Hide resolved

smeyerhot reviewed May 6, 2023

View reviewed changes

model/model_training/configs/config.yaml Outdated Show resolved Hide resolved

smeyerhot reviewed May 6, 2023

View reviewed changes

jordiclive and others added 3 commits May 6, 2023 18:10

Update config.yaml

ad589b3

change log dir names and model name for lora 13B in config

Update peft_modeling.py

607dead

change output path for adapter

update config output dirs

82da4ba

andreaskoepf approved these changes May 26, 2023

View reviewed changes

andreaskoepf merged commit e059d86 into LAION-AI:main May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) #2840

Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) #2840

jordiclive commented Apr 22, 2023 •

edited

Loading

github-actions bot commented Apr 22, 2023

github-actions bot commented Apr 22, 2023

github-actions bot commented Apr 22, 2023

smeyerhot commented Apr 25, 2023

jordiclive commented Apr 26, 2023 •

edited

Loading

jordiclive commented Apr 29, 2023

github-actions bot commented May 6, 2023

github-actions bot commented May 6, 2023

smeyerhot May 6, 2023

jordiclive May 6, 2023

jordiclive May 9, 2023

andreaskoepf left a comment

Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) #2840

Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) #2840

Conversation

jordiclive commented Apr 22, 2023 • edited Loading

github-actions bot commented Apr 22, 2023

github-actions bot commented Apr 22, 2023

github-actions bot commented Apr 22, 2023

smeyerhot commented Apr 25, 2023

jordiclive commented Apr 26, 2023 • edited Loading

jordiclive commented Apr 29, 2023

github-actions bot commented May 6, 2023

github-actions bot commented May 6, 2023

smeyerhot May 6, 2023

Choose a reason for hiding this comment

jordiclive May 6, 2023

Choose a reason for hiding this comment

jordiclive May 9, 2023

Choose a reason for hiding this comment

andreaskoepf left a comment

Choose a reason for hiding this comment

jordiclive commented Apr 22, 2023 •

edited

Loading

jordiclive commented Apr 26, 2023 •

edited

Loading