Is there an example of w4a4 matmul that can be used as a reference? #283

hyx1999 · 2025-01-17T16:29:44Z

Thanks for your great work!

I want to use bitblas to implement gemm of w4a4, README shows that bitblas supports this operation, but in QuickStart it shows that A_type as int4 is not supported?

Also, I have some questions about the bitblas API. The w4a4 matrix multiplication I need to implement will quantize both weights and activations, accumulate them via int32, and finally dequantize them to fp16. The dequantization requires the “scales” of both weight and activation, but it seems that the current API only supports providing the “scales” of the weights?

LeiWang1999 · 2025-01-17T16:54:32Z

yeah, absolutely we have implementations, checkout test case: https://github.com/microsoft/BitBLAS/blob/main/testing/python/operators/test_general_matmul_ops_int4.py

LeiWang1999 · 2025-01-17T18:19:01Z

@hyx1999 about the rescaling, you can checkout our integration of bitnet into vllm, which fuse the rescale stage via torch compile :)

hyx1999 · 2025-01-18T01:46:35Z

Thank you for your reply!!! 🫶

hyx1999 closed this as completed Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an example of w4a4 matmul that can be used as a reference? #283

Is there an example of w4a4 matmul that can be used as a reference? #283

hyx1999 commented Jan 17, 2025

LeiWang1999 commented Jan 17, 2025

LeiWang1999 commented Jan 17, 2025

hyx1999 commented Jan 18, 2025

Is there an example of w4a4 matmul that can be used as a reference? #283

Is there an example of w4a4 matmul that can be used as a reference? #283

Comments

hyx1999 commented Jan 17, 2025

LeiWang1999 commented Jan 17, 2025

LeiWang1999 commented Jan 17, 2025

hyx1999 commented Jan 18, 2025