-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for QLoRA/ QAdapter training via bitsandbytes #663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good just one small question about something that is unclear to me
# result shape: <batch_size> x <seq_len> x <head_dim> | ||
layer_output = F.linear(input_states, weight, bias=self.bias) | ||
else: | ||
layer_output = super().forward(input_states) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which forward method is called here since this does not inherit from nn.Linear anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the subclasses of this (LoRALinearTorch
, LoRALinear4bit
, LoRALinear8bitLt
), inherit from different types of linear layers
…678) Adapters currently does not work correctly with passing `device_map="auto"` in a model's `from_pretrained()`. Device auto-mapping is handled by HF accelerate, which wraps the original module forward method. This PR fixes compatibility of Adapters' post-hoc model wrapping with Accelerate's device auto-mapping via wrapping the forward pass. Fixing this is required for enabling quantized training of adapters (bottleneck & prefix-tuning) in #663.
This PR adds support for wrapping bitsandbytes'
Linear4bit
andLinear8bitLt
quantization layers with our LoRA implementation, enabling training LoRA adapters on quantized models in QLoRA style.Implementation is loosely similar to HF peft's approach, which can be found here: https://github.com/huggingface/peft/blob/v0.10.0/src/peft/tuners/lora/bnb.py.
Demo
I've added a new notebook here: https://github.com/calpt/adapter-transformers/blob/dev/qlora/notebooks/QLoRA_Llama_Finetuning.ipynb.
The notebook showcases this feature by finetuning a 4bit-quantized Llama 2 7B on an instruction tuning dataset (similar to Guanaco in the QLoRA paper).
Tested that it runs without errors in the provided notebook, other setups are not extensively tested yet.
Pre-trained checkpoints
Adapters trained with the notebook code can be found here:
Llama-2 7B: https://huggingface.co/AdapterHub/llama2-7b-qlora-openassistant
Llama-2 13B: https://huggingface.co/AdapterHub/llama2-13b-qlora-openassistant
Current limitations