-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate top level quantization APIs #344
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/344
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1648e69 with merge base 0bde6d5 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
d382147
to
5f59cbb
Compare
seems ok but i would maybe check partners like torchchat/torchtune...etc for those api's since they're what had been used previously also is it possible to check for usage of these apis and give a better error like if someone tried to use change_linear_weight_to_int8dqtensor it'd be nice if we directly caught that error and said 'use this instead' |
torchtune has version guard so should be fine I think. executorch is not using APIs touched by the PR. torchchat is also not using these APIs yet. yeah we could catch the usage and give a better error although that would mean we are keeping these things in the code base for a bit longer, I can add these though |
actually I still want to remove these APIs from the list, so let's just break BC for now |
d206f3c
to
51f8441
Compare
torchao/quantization/README.md
Outdated
`torch.export.export` and `torch.aot_compile` with the following workaround: | ||
``` | ||
from torchao.quantization.utils import unwrap_tensor_subclass | ||
m_unwrapped = unwrap_tensor_subclass(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comes out of nowhere and should either be eliminated as part of the quantize
api or explained better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is temporary I think, also user don't need to understand details for this one? can you clarify a bit on how to explain better for this one?
torchao/quantization/README.md
Outdated
torch._export.aot_compile(m_unwrapped, example_inputs) | ||
``` | ||
|
||
But we expect this will be integrated into the export path by default in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add todos in docs, add them in github issues and assign them to yourself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added #345
so the new quant api + unwrap_tensor_subclass workaround actually only works for 2.4+ (since we have a fix pytorch/pytorch#124888) that means we can't really remove the old implementations at this point, I'm thinking of just keep the old APIs for now as private APIs and remove these when we set our minimum support to pytorch 2.4+ |
51f8441
to
b66f0cf
Compare
a75b199
to
f2b9890
Compare
addressed comments, please take a look again
f5961b2
to
209ab7a
Compare
Summary: This PR deprecates a few quantization APIs and here are the bc-breaking notes: 1. int8 weight only quantization int8 weight only quant module swap API ``` apply_weight_only_int8_quant(model) ``` and int8 weight only tensor subclass API ``` change_linear_weights_to_int8_woqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8wo_quant())) ``` 2. int8 dynamic quantization ``` apply_dynamic_quant(model) ``` or ``` change_linear_weights_to_int8_dqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8dyn_quant())) ``` 3. int4 weight only quantization ``` change_linear_weights_to_int4_wotensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int4wo_quant())) ``` Test Plan: python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
209ab7a
to
1648e69
Compare
Summary: This PR deprecates a few quantization APIs and here are the bc-breaking notes: 1. int8 weight only quantization int8 weight only quant module swap API ``` apply_weight_only_int8_quant(model) ``` and int8 weight only tensor subclass API ``` change_linear_weights_to_int8_woqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8wo_quant())) ``` 2. int8 dynamic quantization ``` apply_dynamic_quant(model) ``` or ``` change_linear_weights_to_int8_dqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8dyn_quant())) ``` 3. int4 weight only quantization ``` change_linear_weights_to_int4_wotensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int4wo_quant())) ``` Test Plan: python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Summary: This PR deprecates a few quantization APIs and here are the bc-breaking notes: 1. int8 weight only quantization int8 weight only quant module swap API ``` apply_weight_only_int8_quant(model) ``` and int8 weight only tensor subclass API ``` change_linear_weights_to_int8_woqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8wo_quant())) ``` 2. int8 dynamic quantization ``` apply_dynamic_quant(model) ``` or ``` change_linear_weights_to_int8_dqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8dyn_quant())) ``` 3. int4 weight only quantization ``` change_linear_weights_to_int4_wotensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int4wo_quant())) ``` Test Plan: python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
|
||
# for torch 2.4+ | ||
from torchao.quantization.quant_api import quantize | ||
quantize(model, "int8_dynamic_quant") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryzh168 this should be "int8_dynamic" right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right
Hi @jerryzh168 this breaks torchtune when we run on ao nightlies. Ref |
Summary: This PR deprecates a few quantization APIs and here are the bc-breaking notes: 1. int8 weight only quantization int8 weight only quant module swap API ``` apply_weight_only_int8_quant(model) ``` and int8 weight only tensor subclass API ``` change_linear_weights_to_int8_woqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8wo_quant())) ``` 2. int8 dynamic quantization ``` apply_dynamic_quant(model) ``` or ``` change_linear_weights_to_int8_dqtensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int8dyn_quant())) ``` 3. int4 weight only quantization ``` change_linear_weights_to_int4_wotensors(model) ``` --> unified tensor subclass API ``` quantize(model, get_apply_int4wo_quant())) ``` Test Plan: python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Summary:
This PR deprecates a few quantization APIs
Deprecation summary:
deprecated for all pytorch versions (2.2.2, 2.3 and 2.4+):
apply_weight_only_int8_quant
andapply_dynamic_quant
also deprecated for 2.4+:
change_linear_weights_to_int8_woqtensors
,change_linear_weights_to_int8_dqtensors
andchange_linear_weights_to_int4_wotensors
BC-breaking notes
for torch version 2.3 and before, we are keeping the the
change_linear_weights_...
APIs, since the newquantize
API needs a parametrization fix (pytorch/pytorch#124888) to work1. int8 weight only quantization int8 weight only quant module swap API
torch 2.4+
-->
torch 2.2.2 and 2.3
-->
2. int8 dynamic quantization
torch 2.4+
-->
torch 2.2.2 and 2.3
-->
3. int4 weight only quantization
torch 2.4+
-->
torch 2.2.2 and 2.3
no change
Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py
Reviewers:
Subscribers:
Tasks:
Tags: