Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) #2840

Merged
merged 8 commits into from
May 26, 2023

Conversation

jordiclive
Copy link
Collaborator

@jordiclive jordiclive commented Apr 22, 2023

This PR adds LoRA and prefix-tuning as modelling options (training and sampling code).

Both have shown strong performance and can outperform fine-tuning. They also can protect against the catastrophic forgetting problem which is important for chatbots. They keep the whole language model frozen so they can be distributed freely independent of the base language model.

They also allow much more memory-efficient training as there is no need for the optimizer states of the base model.

Benefits of LoRA

  • Can run only DS 2 with 30B.
  • Run 65B without intense CPU usage
  • Less overfitting, maintains performance on datasets.
  • Only need to share/push to hub the OS component (small file in Mbs)

— See Andrej Karpathy (OpenAI) comment
— See purported google leak

Implementation Details:

  • Explicitly set input_ids as a keyword argument in the sampling code to ensure proper functionality with PEFT generate.
  • Manually enable gradients for input to leverage gradient checkpointing effectively, as frozen embeddings would otherwise prevent gradient attachment.
  • Include saving code from pytorch_model.bin for special tokens. Although these tokens are randomly initialized, they must be stored and saved as an external module since PEFT parameters learn to utilize them. Making them trainable is an option, but it is unlikely to make a significant difference.
  • Integrate prefix-tuning, a powerful technique that involves stacking keys and values, by incorporating custom LLama modelling code. (Training speed lower than LoRA in my initial tests and performance similar/worse).

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

2 similar comments
@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@smeyerhot
Copy link

@jordiclive would this PR make it possible to load a PEFT model for inference in the chat?

@jordiclive
Copy link
Collaborator Author

jordiclive commented Apr 26, 2023

@smeyerhot This code is currently just for model training and evaluation. But should be trivial to load it for inference, it uses the same HF generate method.

@jordiclive
Copy link
Collaborator Author

@andreaskoepf I am going to run a 30B LoRA model just on the sft datasets and will post the sampling report.

@github-actions
Copy link

github-actions bot commented May 6, 2023

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

github-actions bot commented May 6, 2023

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@jordiclive jordiclive changed the title Add PEFT Modeling code. Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially) May 6, 2023
@jordiclive jordiclive marked this pull request as ready for review May 6, 2023 10:36
freeze_layer: bool = False
residual_dropout: float = 0
use_flash_attention: bool = False
adapter_save_path: str = "adapter_13B_new"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only work on 13B model? Should it be hardcoded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the output path for a directory with the adapter weights in them. I've changed to just "adapter"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smeyerhot I've updated the configs.

jordiclive and others added 3 commits May 6, 2023 18:10
change log dir names and model name for lora 13B in config
change output path for adapter
Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super great to have this!

@andreaskoepf andreaskoepf merged commit e059d86 into LAION-AI:main May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants