Add blockwise quantized dot support #7605

lsy323 · 2024-07-02T06:10:03Z

Add blockwise quantized dot support for 8-bit and 4-bit weight

Test:

Added unit tests

torch_xla/experimental/xla_quantized_matmul.py

JackCaoG · 2024-07-08T20:10:22Z

We should also have a doc to explain how to use these quantized ops under https://github.com/pytorch/xla/tree/master/docs

lsy323 · 2024-07-08T20:10:45Z

We should also have a doc to explain how to use these quantized ops under https://github.com/pytorch/xla/tree/master/docs

It's in https://github.com/pytorch/xla/blob/master/docs/quantized_ops.md

miladm · 2024-07-09T22:06:28Z

test/quantized_ops/test_quantized_matmul.py

+
+        # Dot with int4 weight is only supported on TPU
+        if not (n_bit == 4 and xr.device_type() != 'TPU'):
+          m = m.to(device)


what's the behavior on CUDA and CPU device?

Because int4 only runs on XLA:TPU

offline discussion summary: int4 works only on XLA:TPU today; XLA:CPU does not support INT4. XLA:GPU level of support is unclear as it is not tested currently.

Siyuan Liu and others added 3 commits July 2, 2024 03:57

add blockwise quant

b260263

update test

9febaa8

update readme

683c8a8

lsy323 changed the title ~~Add blockwise quant~~ Add blockwise quant support Jul 2, 2024

lsy323 changed the title ~~Add blockwise quant support~~ Add blockwise quantized op support Jul 2, 2024

lsy323 changed the title ~~Add blockwise quantized op support~~ Add blockwise quantized dot support Jul 2, 2024

lsy323 added the Quantization label Jul 2, 2024

fix test

6bae4a0

lsy323 marked this pull request as ready for review July 2, 2024 16:18

lsy323 requested review from miladm, JackCaoG and ManfeiBai July 2, 2024 16:18

ManfeiBai approved these changes Jul 2, 2024

View reviewed changes

lsy323 mentioned this pull request Jul 3, 2024

Asymmetric quantized matmul support #7626

Merged

JackCaoG reviewed Jul 8, 2024

View reviewed changes

torch_xla/experimental/xla_quantized_matmul.py Show resolved Hide resolved

JackCaoG reviewed Jul 8, 2024

View reviewed changes

torch_xla/experimental/xla_quantized_matmul.py Outdated Show resolved Hide resolved

JackCaoG approved these changes Jul 8, 2024

View reviewed changes

update doc str

5b7e48e

lsy323 merged commit 88bcb45 into master Jul 8, 2024
23 checks passed

lsy323 deleted the lsiyuan/blockwise-quant branch July 8, 2024 22:45

miladm reviewed Jul 9, 2024

View reviewed changes

ManfeiBai added the 2024Q3-manfei-review label Sep 26, 2024

lsy323 removed the 2024Q3-manfei-review label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blockwise quantized dot support #7605

Add blockwise quantized dot support #7605

lsy323 commented Jul 2, 2024 •

edited

Loading

JackCaoG commented Jul 8, 2024

lsy323 commented Jul 8, 2024

miladm Jul 9, 2024 •

edited

Loading

lsy323 Jul 9, 2024

miladm Jul 9, 2024 •

edited

Loading

Add blockwise quantized dot support #7605

Add blockwise quantized dot support #7605

Conversation

lsy323 commented Jul 2, 2024 • edited Loading

JackCaoG commented Jul 8, 2024

lsy323 commented Jul 8, 2024

miladm Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

lsy323 Jul 9, 2024

Choose a reason for hiding this comment

miladm Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

lsy323 commented Jul 2, 2024 •

edited

Loading

miladm Jul 9, 2024 •

edited

Loading

miladm Jul 9, 2024 •

edited

Loading