-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add blockwise quantized dot support #7605
Conversation
We should also have a doc to explain how to use these quantized ops under https://github.com/pytorch/xla/tree/master/docs |
It's in https://github.com/pytorch/xla/blob/master/docs/quantized_ops.md |
|
||
# Dot with int4 weight is only supported on TPU | ||
if not (n_bit == 4 and xr.device_type() != 'TPU'): | ||
m = m.to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the behavior on CUDA
and CPU
device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because int4 only runs on XLA:TPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offline discussion summary: int4
works only on XLA:TPU today; XLA:CPU does not support INT4. XLA:GPU level of support is unclear as it is not tested currently.
Add blockwise quantized dot support for 8-bit and 4-bit weight
Test: