GPTQ 4-bit models seem significantly worse (atleast for the 7B version) #409

athu16 · 2023-03-18T15:38:05Z

athu16
Mar 18, 2023

I've read that the 4-bit version shouldn't be noticeably different than the original 16-bit version. However, it seems significantly worse, atleast for the 7B version which I tested.

8-bit (bitsandbytes):

4-bit (GPTQ):

BarfingLemurs · 2023-03-18T15:56:24Z

BarfingLemurs
Mar 18, 2023

For llamas+ instruct loras, only 8bit is functional currently. Maybe you are just comparing them inaccurately?

If you're using a converted 4bit model of https://huggingface.co/chavinlo/alpaca-native/discussions
it may work.
(not talking a lora merged with a model, or a differenced model)

3 replies

athu16 Mar 18, 2023
Author

Sorry, probably should have specified in the title. I'm not using Alpaca or LORA. I'm comparing LLaMA 7B's 8-bit version (using bitsandbytes), and the 4-bit version (using GPTQ).

athu16 Mar 18, 2023
Author

For comparison, the following examples were generated using this 13B model trained on the alpaca dataset:

It's perfect!

BarfingLemurs Mar 18, 2023

hmm, you are mistaken. you are using a 7b model, and I'm not sure if your using fp16 or 8bit mode.
I have the same general confusion on which type produces good accuracy.

it is said that larger models like OPT, BLOOM quantized using the gptq technique not RTN will produce very relevant results from the very first use (better zero shot accuracy). I think they mean GPTQ 4bit 60B will be better than GPTQ 8bit 30B from the very first use. When you prompt like that, there's no pre-prompt or previous engagement to show context, you're not chatting, so I think 4bit would be better.

I don't see any llama GPTQ 8bit quantized weights. I don't know what 8bit mode with bitsandbytes is doing, is it using the RTN quantize method?

Maybe we can eventually test if GPTQ 4bit or RTN 4bit is better for a trained instruct llama 7B, the tradeoffs when using it as an instruct chatbot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ 4-bit models seem significantly worse (atleast for the 7B version) #409

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

GPTQ 4-bit models seem significantly worse (atleast for the 7B version) #409

athu16 Mar 18, 2023

8-bit (bitsandbytes):

4-bit (GPTQ):

Replies: 1 comment · 3 replies

BarfingLemurs Mar 18, 2023

athu16 Mar 18, 2023 Author

athu16 Mar 18, 2023 Author

BarfingLemurs Mar 18, 2023

athu16
Mar 18, 2023

Replies: 1 comment 3 replies

BarfingLemurs
Mar 18, 2023

athu16 Mar 18, 2023
Author

athu16 Mar 18, 2023
Author