Unify scaled INT8 matmul #862

gau-nernst · 2024-09-10T05:42:24Z

With the new addition of INT8 mixed-precision training, there are now 2 implementations of scaled INT8 matmul (INT8 matmul + dequant)

I have identified the key differences

`intmm_triton.py`	`int8_mm.py`
Only fuse act scale	Fuse both act scale and weight scale
Scale step is `acc_i32 x scale`	Scale step is cast to fp32 `acc_i32.to(f32) x scale.to(f32)`
Different autotune configs	Different autotune configs

Ideally we should only keep 1. The tedious part is to validate there is no accuracy+speed regression, regardless of which final implementation we will adopt.

Here are the places that use intmm_triton.py

https://github.com/pytorch/ao/blob/main/benchmarks/intmm.py
https://github.com/pytorch/ao/blob/main/torchao/quantization/utils.py (I think this file is legacy from module-swap API?)
https://github.com/pytorch/ao/blob/main/torchao/dtypes/affine_quantized_tensor.py
https://github.com/pytorch/ao/blob/main/test/kernel/test_autotuner.py

-> Basically ensure INT8 dynamic quantization for Llama and SAM benchmarks don't regress

Here are the places that use int8_mm.py

https://github.com/pytorch/ao/blob/main/torchao/prototype/quantized_training/int8_mixed_precision.py
https://github.com/pytorch/ao/blob/main/benchmarks/quantized_training/benchmark_int8mm.py (this is similar to the benchmark script for intmm_triton.py above

-> Ensure INT8 mixed-precision training doesn't regress

Another question. Is it ok to change int_scaled_matmul() signature to accept scales for both A and B instead of only for A?

The text was updated successfully, but these errors were encountered:

jerryzh168 · 2024-09-18T03:57:40Z

main/torchao/quantization/utils.py contains a lot of util q/dq ops that's call the more versatile quant primitive ops (quantize_affine/dequantize_affine/choose_qparams_affine) btw, so many of these are convenience functions to hold the configurations for these quant primitive ops (e.g. dtype, block_size, symmetric/asymmetric, quant_min/quant_max, eps etc.).

…h#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Removing all references to HQQ * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Removing GPTQ from all of torchchat * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Rebase + Add back accidental deletion * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Removing all references to HQQ (pytorch#869) * Removing all references to HQQ * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

gau-nernst mentioned this issue Sep 12, 2024

Unify scaled INT8 mm #878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify scaled INT8 matmul #862

Unify scaled INT8 matmul #862

gau-nernst commented Sep 10, 2024

jerryzh168 commented Sep 18, 2024

Unify scaled INT8 matmul #862

Unify scaled INT8 matmul #862

Comments

gau-nernst commented Sep 10, 2024

jerryzh168 commented Sep 18, 2024