[Feature]: bitsandbytes support #4033

orellavie1212 · 2024-04-12T14:40:50Z

🚀 The feature, motivation and pitch

Bitsandbytes 4bit quantization support.
I know many want that, and also it is discuused before and marked as unplaned, but after I looked how TGI implemented that
https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/layers.py#L285
And TGI is based on VLLM ofc.

Alternatives

I know that GPTQ is better quan. compared to b&b 4b, but B&B is great for QLORA merged peft models, while it is almost impossible to gptq/awq quan. a b&b 4bit model (and I am not even talking about nf4 vs fp4 perpelxity case) as they are not offically supporting that (even though others sometimes successfully quantize from merged b&b qlora model to gptq or awq, but I don't for example)

Additional context

As I mentioned above,
https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/layers.py#L285
It looks very simple implementation of the Linear4bit class for b&b, I could add a pr myself to vllm, I just wondered why it is not added, maybe something I miss?

EvilPsyCHo · 2024-04-19T17:06:22Z

BNB 4-bit is a very useful feature. Many models don't have GPTQ or AWQ quantization versions, and it requires some hard work to quantize a large model using post-training methods.

Everyone know post-trianing quantization get better performance , but many guys like me doesn't care about the little performance loss when we try the demo product.

EvilPsyCHo · 2024-04-19T17:12:59Z

After the release of Llama3, I only can play the 8B version with vLLM, and I have to switch to Ollama to run the 70B version.

oushu1zhangxiangxuan1 · 2024-04-23T03:05:49Z

want +1

kevaldekivadiya2415 · 2024-04-26T06:05:58Z

+1

Lu0Key · 2024-04-27T11:14:53Z

want +1

timbmg · 2024-04-27T11:19:48Z

+1

Would be great to run CohereForAI/c4ai-command-r-plus-4bit.

cheney369 · 2024-04-30T01:22:46Z

+1

warlockedward · 2024-05-01T14:00:00Z

+1

aaron-imani · 2024-05-01T21:51:22Z

+1

javierquin · 2024-05-02T18:44:39Z

It will be very usefull for QLORA finetunned models, is there a roadmap for this addition?

dhruvil237 · 2024-05-03T09:05:37Z

+1

dariemp · 2024-05-06T15:36:25Z

+1

qashzar · 2024-05-06T21:04:25Z

+1

salt00n9 · 2024-05-08T09:06:30Z

+1

qdm12 · 2024-05-10T11:37:18Z

Please stop commenting +1, just react to the original post with the thumbs up emoji. Commenting with such comment does not add any value and notifies all people subscribed to this issue.

jeejeelee · 2024-05-13T02:13:35Z

Refer to : #4776

Vegetable-Chicken-Coder · 2024-05-13T07:17:45Z

want +1

duchengyao · 2024-05-20T03:19:49Z

related to #3339

epignatelli · 2024-05-20T07:10:15Z

What's required to implement this? FP4 and NF4 support?

It seems line bnb uses 2 esponent digits and 1 mantissa digit format for FP4.
https://github.com/TimDettmers/bitsandbytes/blob/25abf8d95f8a33f38e2ce6f637768b442379ccd9/bitsandbytes/functional.py#L1049-L1059

flaviusburca · 2024-05-26T18:18:28Z

+1

jeejeelee · 2024-05-27T02:23:52Z

Hi, those who need this feature should check out what @chenqianfzh is working on here: #4776

VpkPrasanna · 2024-06-07T13:18:20Z

Hi Team when can we expect this feature ?

devlup · 2024-07-01T17:18:32Z

+1 any update on this it seems @chenqianfzh #4776 is not working with LLAMA 3

hmellor · 2024-07-04T13:37:13Z

bitsandbytes is now supported https://docs.vllm.ai/en/latest/quantization/supported_hardware.html

devlup · 2024-07-08T15:35:25Z

It's not working for LLama 3 , https://github.com/bd-iaas-us/vllm/blob/e16bcb69495540b21a3bd9423cdd5df8a78405ea/tests/quantization/test_bitsandbytes.py replace it with llama3 8b , it's failing the tests @hmellor @chenqianfzh

junzhang-zj · 2024-08-17T10:16:49Z

@hmellor, how do you load in 8-bit? This version seems to only be able to load in 4-bit via
quantization="bitsandbytes", load_format="bitsandbytes"?

orellavie1212 added the feature request label Apr 12, 2024

scottsuk0306 mentioned this issue May 3, 2024

llama3 evaluator prometheus-eval/prometheus-eval#6

Closed

jeejeelee mentioned this issue May 14, 2024

[Feature][Kernel] Support bitsandbytes quantization and QLoRA #4776

Merged

hmellor mentioned this issue May 20, 2024

8bit support #295

Closed

chenqianfzh added a commit to bd-iaas-us/vllm that referenced this issue May 29, 2024

[model] support bitsandbytes/QLoRA (vllm-project#4033)

126e816

zRzRzRzRzRzRzR mentioned this issue Jun 5, 2024

是否可以提供int4量化的版本？ THUDM/GLM-4#15

Closed

hmellor closed this as completed Jul 4, 2024

orellavie1212 mentioned this issue Jul 7, 2024

[Usage]: BNB Gemma2 9b loading problems #6186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: bitsandbytes support #4033

[Feature]: bitsandbytes support #4033

orellavie1212 commented Apr 12, 2024 •

edited

Loading

EvilPsyCHo commented Apr 19, 2024

EvilPsyCHo commented Apr 19, 2024

oushu1zhangxiangxuan1 commented Apr 23, 2024

kevaldekivadiya2415 commented Apr 26, 2024

Lu0Key commented Apr 27, 2024

timbmg commented Apr 27, 2024

cheney369 commented Apr 30, 2024

warlockedward commented May 1, 2024

aaron-imani commented May 1, 2024

javierquin commented May 2, 2024

dhruvil237 commented May 3, 2024

dariemp commented May 6, 2024

qashzar commented May 6, 2024

salt00n9 commented May 8, 2024

qdm12 commented May 10, 2024

jeejeelee commented May 13, 2024

Vegetable-Chicken-Coder commented May 13, 2024

duchengyao commented May 20, 2024

epignatelli commented May 20, 2024

flaviusburca commented May 26, 2024

jeejeelee commented May 27, 2024

VpkPrasanna commented Jun 7, 2024

devlup commented Jul 1, 2024

hmellor commented Jul 4, 2024

devlup commented Jul 8, 2024 •

edited

Loading

junzhang-zj commented Aug 17, 2024

[Feature]: bitsandbytes support #4033

[Feature]: bitsandbytes support #4033

Comments

orellavie1212 commented Apr 12, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

EvilPsyCHo commented Apr 19, 2024

EvilPsyCHo commented Apr 19, 2024

oushu1zhangxiangxuan1 commented Apr 23, 2024

kevaldekivadiya2415 commented Apr 26, 2024

Lu0Key commented Apr 27, 2024

timbmg commented Apr 27, 2024

cheney369 commented Apr 30, 2024

warlockedward commented May 1, 2024

aaron-imani commented May 1, 2024

javierquin commented May 2, 2024

dhruvil237 commented May 3, 2024

dariemp commented May 6, 2024

qashzar commented May 6, 2024

salt00n9 commented May 8, 2024

qdm12 commented May 10, 2024

jeejeelee commented May 13, 2024

Vegetable-Chicken-Coder commented May 13, 2024

duchengyao commented May 20, 2024

epignatelli commented May 20, 2024

flaviusburca commented May 26, 2024

jeejeelee commented May 27, 2024

VpkPrasanna commented Jun 7, 2024

devlup commented Jul 1, 2024

hmellor commented Jul 4, 2024

devlup commented Jul 8, 2024 • edited Loading

junzhang-zj commented Aug 17, 2024

orellavie1212 commented Apr 12, 2024 •

edited

Loading

devlup commented Jul 8, 2024 •

edited

Loading