Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify scaled INT8 matmul #862

Open
gau-nernst opened this issue Sep 10, 2024 · 1 comment
Open

Unify scaled INT8 matmul #862

gau-nernst opened this issue Sep 10, 2024 · 1 comment

Comments

@gau-nernst
Copy link
Collaborator

With the new addition of INT8 mixed-precision training, there are now 2 implementations of scaled INT8 matmul (INT8 matmul + dequant)

I have identified the key differences

intmm_triton.py int8_mm.py
Only fuse act scale Fuse both act scale and weight scale
Scale step is acc_i32 x scale Scale step is cast to fp32 acc_i32.to(f32) x scale.to(f32)
Different autotune configs Different autotune configs

Ideally we should only keep 1. The tedious part is to validate there is no accuracy+speed regression, regardless of which final implementation we will adopt.

Here are the places that use intmm_triton.py

-> Basically ensure INT8 dynamic quantization for Llama and SAM benchmarks don't regress

Here are the places that use int8_mm.py

-> Ensure INT8 mixed-precision training doesn't regress

Another question. Is it ok to change int_scaled_matmul() signature to accept scales for both A and B instead of only for A?

@jerryzh168
Copy link
Contributor

main/torchao/quantization/utils.py contains a lot of util q/dq ops that's call the more versatile quant primitive ops (quantize_affine/dequantize_affine/choose_qparams_affine) btw, so many of these are convenience functions to hold the configurations for these quant primitive ops (e.g. dtype, block_size, symmetric/asymmetric, quant_min/quant_max, eps etc.).

yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024
…h#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)
yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024
* Initial Creation of a quantization directory

* Moving qops

* updating import

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Update Quant call using llama.cpp (pytorch#868)

llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6
resulting in some of our CI breaking

This updates our CI to match llama.cpp's schema

* Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)
yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024
* Removing all references to HQQ

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Update Quant call using llama.cpp (pytorch#868)

llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6
resulting in some of our CI breaking

This updates our CI to match llama.cpp's schema

* Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Creating an initial Quantization Directory (pytorch#863)

* Initial Creation of a quantization directory

* Moving qops

* updating import

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Update Quant call using llama.cpp (pytorch#868)

llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6
resulting in some of our CI breaking

This updates our CI to match llama.cpp's schema

* Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)
yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024
* Removing GPTQ from all of torchchat

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Rebase + Add back accidental deletion

* Update Quant call using llama.cpp (pytorch#868)

llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6
resulting in some of our CI breaking

This updates our CI to match llama.cpp's schema

* Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Creating an initial Quantization Directory (pytorch#863)

* Initial Creation of a quantization directory

* Moving qops

* updating import

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Update Quant call using llama.cpp (pytorch#868)

llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6
resulting in some of our CI breaking

This updates our CI to match llama.cpp's schema

* Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Removing all references to HQQ (pytorch#869)

* Removing all references to HQQ

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Update Quant call using llama.cpp (pytorch#868)

llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6
resulting in some of our CI breaking

This updates our CI to match llama.cpp's schema

* Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Creating an initial Quantization Directory (pytorch#863)

* Initial Creation of a quantization directory

* Moving qops

* updating import

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)

* Update Quant call using llama.cpp (pytorch#868)

llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6
resulting in some of our CI breaking

This updates our CI to match llama.cpp's schema

* Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862)

* Updating torch nightly to pick up aoti improvements in 128339

* Update the torch version to 2.5

* Updating lm_eval version (pytorch#865)

Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets

* Pinning numpy to under 2.0 (pytorch#867)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants