8bit quantization #3261

rghosh08 · 2024-03-07T21:46:34Z

Does vLLM support 8 bit quantization? We need to use vLLM with large context window (>1K tokens). We tried AWQ but the generation quality is not good. Any pointer will be greatly appreciated.

simon-mo · 2024-03-08T06:24:03Z

Try GPT-Q? We support 2/3/4/8 bits.

andysalerno · 2024-03-08T21:22:54Z

Try GPT-Q? We support 2/3/4/8 bits.

@simon-mo is it possible to support eetq, like huggingface/text-generation-inference?

https://github.com/NetEase-FuXi/EETQ

It's super useful because you don't even need an offline quantization step, you just point it at a normal unquantized model and pass --quantize eetq and then magically you use half the vram and get super fast inference with very little quality impact.

Here's the PR where they added it in TGI:
https://github.com/huggingface/text-generation-inference/pull/1068/files

shiqingzhangCSU · 2024-03-11T06:26:34Z

Try GPT-Q? We support 2/3/4/8 bits.

@simon-mo is it possible to support eetq, like huggingface/text-generation-inference?

https://github.com/NetEase-FuXi/EETQ

It's super useful because you don't even need an offline quantization step, you just point it at a normal unquantized model and pass --quantize eetq and then magically you use half the vram and get super fast inference with very little quality impact.

Here's the PR where they added it in TGI: https://github.com/huggingface/text-generation-inference/pull/1068/files

Good idea. Is it possible to also integrate the W4A16kernel optimization in tensorrtllm?

SidaZh · 2024-03-14T06:55:24Z

That's a good idea. EETQ works out of the box and we'd like to integrate it into vLLM.

github-actions · 2024-10-30T02:00:34Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2024-11-30T02:01:58Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

dtlzhuangz mentioned this issue Mar 25, 2024

[Misc] feat: add eetq quantization #3614

Closed

hibukipanim mentioned this issue Jul 3, 2024

[Roadmap] vLLM Roadmap Q3 2024 #5805

Closed

46 tasks

github-actions bot added the stale label Oct 30, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8bit quantization #3261

8bit quantization #3261

rghosh08 commented Mar 7, 2024 •

edited

Loading

simon-mo commented Mar 8, 2024

andysalerno commented Mar 8, 2024 •

edited

Loading

shiqingzhangCSU commented Mar 11, 2024

SidaZh commented Mar 14, 2024

github-actions bot commented Oct 30, 2024

github-actions bot commented Nov 30, 2024

8bit quantization #3261

8bit quantization #3261

Comments

rghosh08 commented Mar 7, 2024 • edited Loading

simon-mo commented Mar 8, 2024

andysalerno commented Mar 8, 2024 • edited Loading

shiqingzhangCSU commented Mar 11, 2024

SidaZh commented Mar 14, 2024

github-actions bot commented Oct 30, 2024

github-actions bot commented Nov 30, 2024

rghosh08 commented Mar 7, 2024 •

edited

Loading

andysalerno commented Mar 8, 2024 •

edited

Loading