Composing autoquant with compile #175

HDCharles · 2024-04-25T19:21:33Z

Composing autoquant with compile

Summary:

this PR rewrites how torchao.autoquant works so that it works with
torch.compile. Previously you had to do:

    torchao.autoquant(model, input)
    mod=torch.compile(model)
    mod(input)

now you can do

    model = torchao.autoquant(torch.compile(model))
    model(input)

The new method works with/without compile. Also this is BC so the old
path also works.

We use a forward_prehook to intercept the model call before
torch.compile tracing occurs at which point we do the autoquantization
and clean up all remaining hooks before passing things off to the
normal torch.compile tracing functionality.

note: in the case of multiple inputs, you can also do:

    model.forward_log_only(input)

to run the model forward with autoquant
shape logging and prevent the torch.compile tracing/autoquant
quantization from occuring.

Test Plan: python test/integration/test_integration.py -k "autoquant"

Reviewers:

Subscribers:

Tasks:

Tags:

cpuhrsch · 2024-04-29T23:15:08Z

@HDCharles - Could you also update the docs?

Summary: this PR rewrites how torchao.autoquant works so that it works with torch.compile. Previously you had to do: torchao.autoquant(model, input) mod=torch.compile(model) mod(input) now you can do torchao.autoquant(torch.compile(model)) model(input) The new method works with/without compile. Also this is BC so the old path also works. We use a forward_prehook to intercept the model call before torch.compile tracing occurs at which point we do the autoquantization and clean up all remaining hooks before passing things off to the normal torch.compile tracing functionality. note: in the case of multiple inputs, you can also do: model.forward_log_only(input) to run the model forward with autoquant shape logging and prevent the torch.compile tracing/autoquant quantization from occuring. Test Plan: python test/integration/test_integration.py -k "autoquant" Reviewers: Subscribers: Tasks: Tags:

* add dora kernels

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Summary: Creatd a `AffineQuantizedTensor` subclass that works for both weight and input (for dynamic quantization), for all granularities (levering the recently added choose_qparams_affine, quantize_affine and dequantize_affine ops) only verified for 8da4w right now, we can make it work for other types of quantization (mostly the operator dispatching part) later Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

* add expecttest to requirements.txt * update

Install dev-requirements.txt --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

Summary: Accidently changed the device check code for old subclass instead of the new one, forgot to fix before landing Test Plan: CI Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-05-08T03:29:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/175

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e5d215f with merge base 63c5ac5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Summary: att Test Plan: python test/quantization/test_quant_primitives.py -k test_raises Reviewers: Subscribers: Tasks: Tags:

Summary: this PR rewrites how torchao.autoquant works so that it works with torch.compile. Previously you had to do: torchao.autoquant(model, input) mod=torch.compile(model) mod(input) now you can do torchao.autoquant(torch.compile(model)) model(input) The new method works with/without compile. Also this is BC so the old path also works. We use a forward_prehook to intercept the model call before torch.compile tracing occurs at which point we do the autoquantization and clean up all remaining hooks before passing things off to the normal torch.compile tracing functionality. note: in the case of multiple inputs, you can also do: model.forward_log_only(input) to run the model forward with autoquant shape logging and prevent the torch.compile tracing/autoquant quantization from occuring. Test Plan: python test/integration/test_integration.py -k "autoquant" Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

…abs/ao into 085_no_input_autoquant

* Composing autoquant with compile Summary: this PR rewrites how torchao.autoquant works so that it works with torch.compile. Previously you had to do: torchao.autoquant(model, input) mod=torch.compile(model) mod(input) now you can do torchao.autoquant(torch.compile(model)) model(input) The new method works with/without compile. Also this is BC so the old path also works. We use a forward_prehook to intercept the model call before torch.compile tracing occurs at which point we do the autoquantization and clean up all remaining hooks before passing things off to the normal torch.compile tracing functionality. note: in the case of multiple inputs, you can also do: model.forward_log_only(input) to run the model forward with autoquant shape logging and prevent the torch.compile tracing/autoquant quantization from occuring. Test Plan: python test/integration/test_integration.py -k "autoquant" Reviewers: Subscribers: Tasks: Tags: * Fused DoRA kernels (pytorch#216) * add dora kernels * allowing error_on_unseen in autoquant func Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Unified AffineQuantizedTensor subclass (pytorch#214) Summary: Creatd a `AffineQuantizedTensor` subclass that works for both weight and input (for dynamic quantization), for all granularities (levering the recently added choose_qparams_affine, quantize_affine and dequantize_affine ops) only verified for 8da4w right now, we can make it work for other types of quantization (mostly the operator dispatching part) later Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Mark Saroufim <marksaroufim@meta.com> * add expecttest to requirements.txt (pytorch#225) * add expecttest to requirements.txt * update * Install dev-requirements.txt in doc build (pytorch#224) Install dev-requirements.txt --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com> * Fix an error in subclass impl (pytorch#226) Summary: Accidently changed the device check code for old subclass instead of the new one, forgot to fix before landing Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * update readme.md Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * trying to fix the error in CI on cleanup hooks Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * correct docs Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Some follow up fixes for quant primitives (pytorch#220) Summary: att Test Plan: python test/quantization/test_quant_primitives.py -k test_raises Reviewers: Subscribers: Tasks: Tags: * Composing autoquant with compile Summary: this PR rewrites how torchao.autoquant works so that it works with torch.compile. Previously you had to do: torchao.autoquant(model, input) mod=torch.compile(model) mod(input) now you can do torchao.autoquant(torch.compile(model)) model(input) The new method works with/without compile. Also this is BC so the old path also works. We use a forward_prehook to intercept the model call before torch.compile tracing occurs at which point we do the autoquantization and clean up all remaining hooks before passing things off to the normal torch.compile tracing functionality. note: in the case of multiple inputs, you can also do: model.forward_log_only(input) to run the model forward with autoquant shape logging and prevent the torch.compile tracing/autoquant quantization from occuring. Test Plan: python test/integration/test_integration.py -k "autoquant" Reviewers: Subscribers: Tasks: Tags: * allowing error_on_unseen in autoquant func Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * update readme.md Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * trying to fix the error in CI on cleanup hooks Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * correct docs Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Jerry Zhang <jerryzh168@gmail.com> Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: Svetlana Karslioglu <svekars@meta.com>

HDCharles requested review from cpuhrsch and jerryzh168 April 25, 2024 19:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2024

HDCharles force-pushed the 085_no_input_autoquant branch 2 times, most recently from c2697a2 to 144b03d Compare April 25, 2024 19:30

cpuhrsch approved these changes Apr 25, 2024

View reviewed changes

HDCharles force-pushed the 085_no_input_autoquant branch from e261405 to 5583d81 Compare May 6, 2024 22:23

jeromeku and others added 8 commits May 6, 2024 17:37

Fused DoRA kernels (#216)

c2657e4

* add dora kernels

allowing error_on_unseen in autoquant func

f08c339

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

add expecttest to requirements.txt (#225)

cce5960

* add expecttest to requirements.txt * update

Install dev-requirements.txt in doc build (#224)

b34d1ac

Install dev-requirements.txt --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

Fix an error in subclass impl (#226)

9849360

Summary: Accidently changed the device check code for old subclass instead of the new one, forgot to fix before landing Test Plan: CI Reviewers: Subscribers: Tasks: Tags:

update readme.md

8be645b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

trying to fix the error in CI on cleanup hooks

668a02e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles and others added 8 commits May 7, 2024 20:35

correct docs

b6347eb

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Some follow up fixes for quant primitives (#220)

63c5ac5

Summary: att Test Plan: python test/quantization/test_quant_primitives.py -k test_raises Reviewers: Subscribers: Tasks: Tags:

allowing error_on_unseen in autoquant func

a1a6b18

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

update readme.md

f7eea4d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

trying to fix the error in CI on cleanup hooks

7aa8ba5

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

correct docs

91cb3b2

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch '085_no_input_autoquant' of https://github.com/pytorch-l…

e5d215f

…abs/ao into 085_no_input_autoquant

HDCharles merged commit f6d56ca into main May 8, 2024
13 checks passed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

add tests (pytorch#175)

78b7aac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composing autoquant with compile #175

Composing autoquant with compile #175

HDCharles commented Apr 25, 2024 •

edited

Loading

cpuhrsch commented Apr 29, 2024

pytorch-bot bot commented May 8, 2024 •

edited

Loading

Composing autoquant with compile #175

Composing autoquant with compile #175

Conversation

HDCharles commented Apr 25, 2024 • edited Loading

cpuhrsch commented Apr 29, 2024

pytorch-bot bot commented May 8, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/175

✅ No Failures

HDCharles commented Apr 25, 2024 •

edited

Loading

pytorch-bot bot commented May 8, 2024 •

edited

Loading