Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QWEN int4 bad perfomance #360

Closed
chunniunai220ml opened this issue Feb 23, 2024 · 7 comments
Closed

QWEN int4 bad perfomance #360

chunniunai220ml opened this issue Feb 23, 2024 · 7 comments

Comments

@chunniunai220ml
Copy link

when i test QWEN model , got normal results
image

but after Auto -awq, the performance got badly

wikitext-ppl=13.560

image

is it normal? when i tested AWQ-official for llama2, the ppl did not reduce very much.

by the way , i modified the code to support zero=True(symmetric mode) as :
image

then got much worse ppl=1056944.250,

stem | hunmaities | other | social | avg
26.26 | 27.12 | 24.01 | 23.79 | 25.51

finnaly, any differences for 3 types of version(gemm, gemv, marlin) in theory and use?

@casper-hansen
Copy link
Owner

Hi @chunniunai220ml, this is not normal performance. Quantization error measured by perplexity is usually 1-2%. Did you use a custom dataset?

@benjamin-marie
Copy link

I have also observed a similar performance drop. For instance, on winogrande, arc challenge, and hellaswag, Qwen-1.5 7B, when quantized with AWQ, performs 10 points (or more) of accuracy lower than Qwen-1.5 quantized with GTPQ 4-bit.

Here is my config:
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

I use the last version of AutoAWQ.

@chunniunai220ml
Copy link
Author

Hi @chunniunai220ml, this is not normal performance. Quantization error measured by perplexity is usually 1-2%. Did you use a custom dataset?
not custom dataset, no code change, and follow examples

@bratao
Copy link

bratao commented Feb 26, 2024

+1 here. Qwen/Qwen1.5-14B-Chat-GPTQ-Int4 produces much better results than Qwen/Qwen1.5-14B-Chat-AWQ. Way closer to the original model.

@chunniunai220ml
Copy link
Author

@casper-hansen i close the clip step , got reasonable results, but how to explain this?

@Relissc
Copy link

Relissc commented Apr 4, 2024

Hello, @chunniunai220ml when I use autoawq to qauntize Qwen, but there comes a error:"RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):", what is your Environment(especially the version of 'transformers')? can you share?

@casper-hansen casper-hansen mentioned this issue Apr 6, 2024
13 tasks
@casper-hansen
Copy link
Owner

You can now use apply_clip=False on the quantize() method. I didn't find that it improved the model much, but the option is there now.

For reference, I am not able to reproduce the bad performance of QWen in my testing:

That's roughly a 1% quantization error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants