Add Quantization code. #107

arnocandel · 2023-05-02T21:38:01Z

Add quantization logic #88

based on: https://github.com/qwopqwop200/GPTQ-for-LLaMa (Apache 2.0)

arnocandel · 2023-05-03T22:49:31Z

still garbage, but at least integrated
generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6.9b --quant_model=h2ogpt-oig-oasst1-512-6.9b-8bit.pt

arnocandel · 2023-05-03T23:11:08Z

qwopqwop200/GPTQ-for-LLaMa#207 (comment)

arnocandel · 2023-05-03T23:12:52Z

so maybe need https://github.com/PanQiWei/AutoGPTQ

arnocandel · 2023-05-03T23:14:49Z

https://github.com/PanQiWei/AutoGPTQ/blob/main/docs/tutorial/01-Quick-Start.md

arnocandel · 2023-05-05T01:41:45Z

python quantize.py
CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6.9b --quant_model=h2ogpt-oig-oasst1-512-6.9b-4bit

arnocandel · 2023-06-21T04:03:25Z

2 quantization approaches:

1. https://modal.com/docs/guide/ex/falcon_bitsandbytes

python generate.py --base_model=h2oai/h2ogpt-oasst1-falcon-40b --load_4bit=True

2. https://modal.com/docs/guide/ex/falcon_gptq

this PR (probably slower)

arnocandel force-pushed the quantization branch 7 times, most recently from 9a811b5 to f78c78d Compare May 2, 2023 22:16

Add Quantization code.

48e2530

based on: https://github.com/qwopqwop200/GPTQ-for-LLaMa (Apache 2.0)

arnocandel force-pushed the quantization branch from f78c78d to 48e2530 Compare May 2, 2023 22:19

arnocandel added 4 commits May 2, 2023 20:04

Add quantization option to generate.

fdb488c

Export env var.

05eddf7

Add failing command back.

da07ec0

Use groupsize 128

de71947

arnocandel force-pushed the quantization branch from 38614c0 to de71947 Compare May 3, 2023 05:03

arnocandel added 5 commits May 3, 2023 08:40

Fix import.

ebfcd44

Merge remote-tracking branch 'origin/main' into quantization

f0534e3

Fix merge.

60f308a

Cleanup

32db201

Set quant_model to '' for scoring model.

af251f9

arnocandel added 2 commits May 4, 2023 15:45

Use AutoGPTQ instead.

a6bd68f

Working example for 4-bit on GPU.

e1e3424

arnocandel added 2 commits May 15, 2023 18:35

Merge remote-tracking branch 'origin/main' into quantization

371940e

WIP

d63a183

arnocandel mentioned this pull request May 17, 2023

How come no 4bit? #136

Closed

arnocandel mentioned this pull request May 25, 2023

Use latest peft/transformers/accelerate/bitsandbytes for 4-bit (qlora) #166

Merged

3 tasks

arnocandel added 4 commits June 20, 2023 20:53

Merge remote-tracking branch 'origin/main' into quantization

e073d34

Fix merge conflict.

0f89d04

Fix typo

d0ca005

Use latest AutoGPTQ

1c5bb39

Pass trust_remote_code=True.

84c4989

pseudotensor closed this Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Quantization code. #107

Add Quantization code. #107

arnocandel commented May 2, 2023 •

edited

Loading

arnocandel commented May 3, 2023 •

edited

Loading

arnocandel commented May 3, 2023

arnocandel commented May 3, 2023

arnocandel commented May 3, 2023

arnocandel commented May 5, 2023

arnocandel commented Jun 21, 2023

Add Quantization code. #107

Add Quantization code. #107

Conversation

arnocandel commented May 2, 2023 • edited Loading

arnocandel commented May 3, 2023 • edited Loading

arnocandel commented May 3, 2023

arnocandel commented May 3, 2023

arnocandel commented May 3, 2023

arnocandel commented May 5, 2023

arnocandel commented Jun 21, 2023

2 quantization approaches:

1. https://modal.com/docs/guide/ex/falcon_bitsandbytes

2. https://modal.com/docs/guide/ex/falcon_gptq

arnocandel commented May 2, 2023 •

edited

Loading

arnocandel commented May 3, 2023 •

edited

Loading