Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blockwise quantized dot support #7605

Merged
merged 5 commits into from
Jul 8, 2024
Merged

Add blockwise quantized dot support #7605

merged 5 commits into from
Jul 8, 2024

Conversation

lsy323
Copy link
Collaborator

@lsy323 lsy323 commented Jul 2, 2024

Add blockwise quantized dot support for 8-bit and 4-bit weight

Test:

  • Added unit tests

@lsy323 lsy323 changed the title Add blockwise quant Add blockwise quant support Jul 2, 2024
@lsy323 lsy323 changed the title Add blockwise quant support Add blockwise quantized op support Jul 2, 2024
@lsy323 lsy323 changed the title Add blockwise quantized op support Add blockwise quantized dot support Jul 2, 2024
@lsy323 lsy323 marked this pull request as ready for review July 2, 2024 16:18
@lsy323 lsy323 requested review from miladm, JackCaoG and ManfeiBai July 2, 2024 16:18
@JackCaoG
Copy link
Collaborator

JackCaoG commented Jul 8, 2024

We should also have a doc to explain how to use these quantized ops under https://github.com/pytorch/xla/tree/master/docs

@lsy323
Copy link
Collaborator Author

lsy323 commented Jul 8, 2024

We should also have a doc to explain how to use these quantized ops under https://github.com/pytorch/xla/tree/master/docs

It's in https://github.com/pytorch/xla/blob/master/docs/quantized_ops.md

@lsy323 lsy323 merged commit 88bcb45 into master Jul 8, 2024
23 checks passed
@lsy323 lsy323 deleted the lsiyuan/blockwise-quant branch July 8, 2024 22:45

# Dot with int4 weight is only supported on TPU
if not (n_bit == 4 and xr.device_type() != 'TPU'):
m = m.to(device)
Copy link
Collaborator

@miladm miladm Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior on CUDA and CPU device?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because int4 only runs on XLA:TPU

Copy link
Collaborator

@miladm miladm Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offline discussion summary: int4 works only on XLA:TPU today; XLA:CPU does not support INT4. XLA:GPU level of support is unclear as it is not tested currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants