-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QWEN int4 bad perfomance #360
Comments
Hi @chunniunai220ml, this is not normal performance. Quantization error measured by perplexity is usually 1-2%. Did you use a custom dataset? |
I have also observed a similar performance drop. For instance, on winogrande, arc challenge, and hellaswag, Qwen-1.5 7B, when quantized with AWQ, performs 10 points (or more) of accuracy lower than Qwen-1.5 quantized with GTPQ 4-bit. Here is my config: I use the last version of AutoAWQ. |
|
+1 here. Qwen/Qwen1.5-14B-Chat-GPTQ-Int4 produces much better results than Qwen/Qwen1.5-14B-Chat-AWQ. Way closer to the original model. |
@casper-hansen i close the clip step , got reasonable results, but how to explain this? |
Hello, @chunniunai220ml when I use autoawq to qauntize Qwen, but there comes a error:"RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):", what is your Environment(especially the version of 'transformers')? can you share? |
You can now use For reference, I am not able to reproduce the bad performance of QWen in my testing:
That's roughly a 1% quantization error. |
when i test QWEN model , got normal results
but after Auto -awq, the performance got badly
wikitext-ppl=13.560
is it normal? when i tested AWQ-official for llama2, the ppl did not reduce very much.
by the way , i modified the code to support zero=True(symmetric mode) as :
then got much worse ppl=1056944.250,
stem | hunmaities | other | social | avg
26.26 | 27.12 | 24.01 | 23.79 | 25.51
finnaly, any differences for 3 types of version(gemm, gemv, marlin) in theory and use?
The text was updated successfully, but these errors were encountered: