Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an example of w4a4 matmul that can be used as a reference? #283

Closed
hyx1999 opened this issue Jan 17, 2025 · 3 comments
Closed

Is there an example of w4a4 matmul that can be used as a reference? #283

hyx1999 opened this issue Jan 17, 2025 · 3 comments

Comments

@hyx1999
Copy link

hyx1999 commented Jan 17, 2025

Thanks for your great work!

I want to use bitblas to implement gemm of w4a4, README shows that bitblas supports this operation, but in QuickStart it shows that A_type as int4 is not supported?

Also, I have some questions about the bitblas API. The w4a4 matrix multiplication I need to implement will quantize both weights and activations, accumulate them via int32, and finally dequantize them to fp16. The dequantization requires the “scales” of both weight and activation, but it seems that the current API only supports providing the “scales” of the weights?

@LeiWang1999
Copy link
Contributor

yeah, absolutely we have implementations, checkout test case: https://github.com/microsoft/BitBLAS/blob/main/testing/python/operators/test_general_matmul_ops_int4.py

@LeiWang1999
Copy link
Contributor

@hyx1999 about the rescaling, you can checkout our integration of bitnet into vllm, which fuse the rescale stage via torch compile :)

@hyx1999
Copy link
Author

hyx1999 commented Jan 18, 2025

Thank you for your reply!!! 🫶

@hyx1999 hyx1999 closed this as completed Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants