Sparsebit/large_language_models/alpaca-qlora at main · megvii-research/Sparsebit

History

Name		Name	Last commit message	Last commit date
parent directory ..
caches/llama-7b		caches/llama-7b
cuda		cuda
README.md		README.md
alpaca_data.json		alpaca_data.json
convert_pack32topack8.py		convert_pack32topack8.py
finetune.py		finetune.py
finetune_pp.py		finetune_pp.py
generate.py		generate.py
load_cuda_kernel.py		load_cuda_kernel.py
model_pp.py		model_pp.py
qlora.py		qlora.py
qmatmul.py		qmatmul.py
requirements.txt		requirements.txt
utils.py		utils.py

README.md

Introduction

alpaca-lora is a great project which allows to run instuct-tuning on a single RTX4090 within hours. After instruct-tuning, an instruct model of similar quality to text-davinci-003 that can be obtained.
However, the larger foundation model, the better the instruction results can be obtained. And we hope that everyone can enjoy this benefit. Therefore, we provide alpaca-qlora, which quantizes the backbone into 4bit while keep lora-parameters as fp16.
In alpaca-qlora, the GPU memory of about half model size will be released(for example, llama-7B will releases 3.5GB). When computing resources are insufficient, it can alleviate the demand; even in the case of sufficient computing resources, alpaca-qlora can help to expand the CUTOFF_LEN which maybe improve your instuct-tuning results or increase macro-batch-size to reduce your training time.

Install

1. Install dependencies pip install -r requirements.txt
1. If bitsandbytes doesn't work, install it from source. Windows users can follow these instructions.
1. Install CUDA cutlass
- git clone https://github.com/NVIDIA/cutlass
- cd /path/to/repo/cuda/
- ln -s /path/to/cutlass/ .
- ./build_cutlass.sh
- source environment.sh

Usage

how to quant backbone?

go to qllama to get quant backbone
you can also download a checkpoint llama-7B as the quant backbone(pack32).
convert the weight dtype of quant backbone from torch.int32 to torch.int8: python3 convert_pack32topack8.py /path/to/quant-backbone-pack32 /path/to/output-quant-backbone-pack8

Training LLaMA-7b on single 2080ti

python3 finetune.py

Training LLaMA-65b on 8*2080ti with Pipeline Parallelism(PP)

python3 finetune_pp.py decapoda-research/llama-65b-hf /path/to/llama65b-pack8 --chunks 16 --pp_checkpoint except_last --micro_batch_size 32

Inference

python3 generate.py --load-qlora --llama-config /path/to/llama/config.json --qllama-checkpoint /path/to/quant-backbone-pack8 --qlora-dir /path/to/save/adapter --port 7860

training LLaMA-7b on single 2080ti

the data of gpu-memory from nvidia-smi

method	gpu-memory	micro-batch-size	gpu-hours
alpaca-lora	8.71G	4	14.25h
alpaca-qlora(ours)	5.63G	4	16h
alpaca-qlora(ours)	8.09G	16	11.5h

Time cost per epoch for training on 8*2080ti with Pipeline Parallelism(PP)

LLaMA-7b chunks	1	2	4	8	16	32
micro_batch=8	3.0h	2.2h	2.3h	4.3h	-	-
micro_batch=16	2.6h	1.8h	1.6h	2.3h	4.5h	-
micro_batch=32	OOM	1.6h	1.25h	1.45h	2.4h	4.5h

LLaMA-65b chunks	2	4	8	16	32
micro_batch=8	OOM	23h	27h	-	-
micro_batch=16	OOM	OOM	16h	23h	-
micro_batch=32	OOM	OOM	OOM	13h	21.5h

Results

sample-1

Instruct: Tell me about alpaca.
alpaca-qlora output: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are prized for their soft, luxurious fleece, which is used to make clothing, blankets, and other items. Alpacas are herbivores and graze on grasses, shrubs, and other vegetation. Alpacas are social animals and live in herds of up to 20 individuals. They are known for their docile nature and gentle temperament.
alpaca-lora output: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are known for their soft, luxurious fleece, which is used to make clothing, blankets, and other items. Alpacas are herbivores and graze on grasses and other plants. They are social animals and live in herds of up to 20 individuals.

sample-2

Instruct: List all Canadian provinces in alphabetical order
alpaca-qlora output: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Ontario, Prince Edward Island, Saskatchewan, Yukon.
alpaca lora output: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan.

sample-3

Instuct: Write a Python program that prints the first 10 Fibonacci numbers.
alpaca-qlora output:

"""
This program prints the first 10 Fibonacci numbers.
"""

def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(10))

alpaca-lora output:

def fibonacci(n):
if n == 0:
    return 0
elif n == 1:
    return 1
else:
    return fibonacci(n-1) + fibonacci(n-2)
print(fibonacci(10))

Acknowledgement

We are grateful for these excellent projects and list them as follows:
- GPTQ
- alpaca-lora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alpaca-qlora

alpaca-qlora

README.md

Introduction

Install

Usage

how to quant backbone?

Training LLaMA-7b on single 2080ti

Training LLaMA-65b on 8*2080ti with Pipeline Parallelism(PP)

Inference

training LLaMA-7b on single 2080ti

Time cost per epoch for training on 8*2080ti with Pipeline Parallelism(PP)

Results

sample-1

sample-2

sample-3

Acknowledgement

Files

alpaca-qlora

Directory actions

More options

Directory actions

More options

Latest commit

History

alpaca-qlora

Folders and files

parent directory

README.md

Introduction

Install

Usage

how to quant backbone?

Training LLaMA-7b on single 2080ti

Training LLaMA-65b on 8*2080ti with Pipeline Parallelism(PP)

Inference

training LLaMA-7b on single 2080ti

Time cost per epoch for training on 8*2080ti with Pipeline Parallelism(PP)

Results

sample-1

sample-2

sample-3

Acknowledgement