Spin Quant in TorchAO #579

HDCharles · 2024-07-30T22:57:46Z

Background:
The spin quant paper introduces a method of improving quantization by adding additional rotation matrices to the model weights that improve quantization performance.

While spin-quant is a fairly sophisticated technique, some independent pieces could be implemented to modularly to get incremental improvements on a smaller scale.

(see image)
https://imgur.com/jU60Iqs

In the above image each rotation in both the a and b parts of the figure could be independently implemented to improve quantization accuracy in the model. These rotations are

Rotations which can be fully absorbed by weight matrices of the linear ops and don’t introduce additional ops:
R2
Rotations which need a constant number of additional ops per model:
R1
Rotations which require additional ops per block:
R3, R4

While the second set and third set of rotations requires adding additional ops to the model, the R2 rotation would require only a small change to the model weights and no additional ops.

Task:

Start by implementing the R2 rotation with a random hadamard matrix (the paper indicates they perform fairly well) and demonstrate the improved quantization accuracy for int8 dynamic/weight-only and int4 weight-only quantization. Ideally we'd like to see improved eval performance in eval compared to the non spin-quant version. Code would ideally go into a new file in torchao/quantization/spin_quant.py.

Adding additional rotations (and the necessary additional ops) or a rotation optimization procedure for R2 as used in Spin Quant can follow after.

tobiasvanderwerff · 2024-09-30T06:15:02Z

I'd love to work on this @HDCharles

tobiasvanderwerff · 2024-09-30T06:44:13Z

One question about this @HDCharles. The SpinQuant repo has a dependency on the CUDA fast Hadamard transform package for doing the actual Hadamard transform. Would it be acceptable to include this dependency in torch/ao?

msaroufim · 2024-09-30T23:38:12Z

merging a custom cuda kernel is fine and you can check csrc/ to see examples we have in the repo

Although as a baseline I'd much prefer we see if we can match the performance using torch.compile() because if the performance is close not merging a custom kernel will make our long term life easier

Granted I also recall @swolchok working on a fast CPU fast hadamard transform so maybe he has some ideas on where the compiler was not doing its job

jerryzh168 · 2024-10-01T20:17:05Z

@tobiasvanderwerff thanks for the PR, we have some internal discussion on how to implement this as well, so we can work together on this

cc @andrewor14 @TiRune

HDCharles · 2024-10-01T21:58:58Z

One question about this @HDCharles. The SpinQuant repo has a dependency on the CUDA fast Hadamard transform package for doing the actual Hadamard transform. Would it be acceptable to include this dependency in torch/ao?

we have other tools like auto round or the llama eval stuff that require an external library and just don't work without the installed package which i think is fine. thats probably a good place to start and if this gets a lot of usage or we want to take this out of prototype we can figure out whether to add a dependency or add the kernel in at that time.

tobiasvanderwerff · 2024-10-02T12:27:15Z

@jerryzh168 I'd be happy to work together on this. For now, I'll be implementing the rotation matrices one by one, and document the results in the PR. Let me know if this works for you or if you prefer a different approach.

jerryzh168 · 2024-10-02T23:46:35Z

@jerryzh168 I'd be happy to work together on this. For now, I'll be implementing the rotation matrices one by one, and document the results in the PR. Let me know if this works for you or if you prefer a different approach.

this sounds good, please go ahead, internally we are interestd in QAT as well, we can discuss later how this fits when QAT is introduced later

HDCharles added the good first issue Good for newcomers label Jul 30, 2024

tobiasvanderwerff mentioned this issue Oct 1, 2024

SpinQuant #983

Merged

6 tasks

msaroufim closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spin Quant in TorchAO #579

Spin Quant in TorchAO #579

HDCharles commented Jul 30, 2024 •

edited

Loading

tobiasvanderwerff commented Sep 30, 2024

tobiasvanderwerff commented Sep 30, 2024

msaroufim commented Sep 30, 2024

jerryzh168 commented Oct 1, 2024 •

edited

Loading

HDCharles commented Oct 1, 2024 •

edited

Loading

tobiasvanderwerff commented Oct 2, 2024

jerryzh168 commented Oct 2, 2024

Spin Quant in TorchAO #579

Spin Quant in TorchAO #579

Comments

HDCharles commented Jul 30, 2024 • edited Loading

tobiasvanderwerff commented Sep 30, 2024

tobiasvanderwerff commented Sep 30, 2024

msaroufim commented Sep 30, 2024

jerryzh168 commented Oct 1, 2024 • edited Loading

HDCharles commented Oct 1, 2024 • edited Loading

tobiasvanderwerff commented Oct 2, 2024

jerryzh168 commented Oct 2, 2024

HDCharles commented Jul 30, 2024 •

edited

Loading

jerryzh168 commented Oct 1, 2024 •

edited

Loading

HDCharles commented Oct 1, 2024 •

edited

Loading