Gptq mt refactor #914

danielpatrickhug · 2024-09-21T16:28:48Z

This is a PR for the ongoing GPTQ MultiTensor refactor.
I added the e2e test and some atomic unit tests for the MultiTensor Class and the Int4WeightOnlyGPTQQuantizer class. I also refactored the non_in_place logic for readability and sped up the quantization. Finally, I added a new inputRecoder that utilized Muti Tensor and not MultiInput. @HDCharles

pytorch-bot · 2024-09-21T16:28:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/914

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 83d8cdd with merge base 1137f39 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

HDCharles

nit: the multitensor probably need a bit more though its not incredible urgent

I would check that

A) MultiTensor.pad, the tensor id's differ post padding
B) if you do an in place op with multiple inputs you get the expected (multiple) changes [this was the issue before, you'd only get the change from whichever ran list so testing this would be important]
C) if you do a not in place op, like functional.linear(MultiTensor(a1, a2), MultiTensorWeight(w1)) the final state of the weight is MultiTensorWeight(w1) rather than MultiTensorWeight(w1, w2) like it would be for an in place op.

HDCharles · 2024-09-26T05:12:07Z

test/quantization/test_quant_api.py

        precision = torch.bfloat16
        device = "cuda"
-        checkpoint_path = Path("../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth")
+        checkpoint_path = Path("checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth")


note by default tests are run from the tests dir and normally checkpoints are in the root of torchao after prepare.sh so this should probably be the previous path

msaroufim · 2024-09-27T21:35:02Z

Haven't followed up here, is this good to merge?

HDCharles · 2024-09-28T03:57:07Z

test/quantization/test_quant_api.py

@@ -338,10 +338,10 @@ def test_8da4w_quantizer_eval(self):
            f"accuracy regressed from 8.23 to {result['results']['wikitext']['word_perplexity,none']}"
        )

-    @unittest.skip("skipping until we get checkpoints for gpt-fast")
+    # @unittest.skip("skipping until we get checkpoints for gpt-fast")


yeah i would add this back in, i removed it for my own manual testing but CI doesn't have lm_eval or the llama checkpoints so we run these manually.

HDCharles · 2024-09-29T01:43:54Z

Haven't followed up here, is this good to merge?

i've been talking to AIHD on discord mostly, i think we're trying to pass CI and then it'll be ready to merge

…uantizableModel class

…andled

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Summary: this is now working 2024-09-12:00:11:50,735 INFO [test_gptq_mt.py:259] wikitext: {'word_perplexity,none': 7.993930171683578, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4931942138977463, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.578401823345637, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} Test Plan: Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

…d unit tests for MT. Refactor inplace ops classification for readablility.

* add inital [WIP] of MultiTensor rewrite of GPTQ * Implement GPTQQuantizer and Int4WeightOnlyGPTQQuantizer * added GPTQuantizer and Int4WeightOnlyGPTQQuantizer classes. Removed QuantizableModel class * removed print statement * fix control structure in torch_function. layer outputs not properly handled * add testing script for gptq Multitensor * testing Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * more fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * more fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * this is working now Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fixes for in place ops Summary: this is now working 2024-09-12:00:11:50,735 INFO [test_gptq_mt.py:259] wikitext: {'word_perplexity,none': 7.993930171683578, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4931942138977463, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.578401823345637, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} Test Plan: Reviewers: Subscribers: Tasks: Tags: * removing some debug code Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * AQT fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * small speed improvement, force unpad unelss we designated op as in place Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * testing Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * final fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * refactor test_gptq_quantizer_int4_weight_only to use new MT class. add unit tests for MT. Refactor inplace ops classification for readablility. * change path to model.pth * add correct imports to tests * fix imports and duplicate lines of code * fix import of model.pth * add decorator to skip test due to lm_eval not in CI * skip MT tests in test_quant_api --------- Co-authored-by: HDCharles <charlesdavidhernandez@gmail.com>

Summary: att, previously in pytorch#914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags:

* Call narrow only for TensorCoreTiledLayout only Summary: att, previously in #914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags: * enable test * version * skip aoti * version update * skip aoti

* Call narrow only for TensorCoreTiledLayout only Summary: att, previously in pytorch#914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags: * enable test * version * skip aoti * version update * skip aoti

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 21, 2024

HDCharles approved these changes Sep 24, 2024

View reviewed changes

HDCharles reviewed Sep 26, 2024

View reviewed changes

HDCharles reviewed Sep 28, 2024

View reviewed changes

danielpatrickhug and others added 22 commits October 1, 2024 09:12

add inital [WIP] of MultiTensor rewrite of GPTQ

c8d0eca

Implement GPTQQuantizer and Int4WeightOnlyGPTQQuantizer

4c2a8ed

added GPTQuantizer and Int4WeightOnlyGPTQQuantizer classes. Removed Q…

66d913c

…uantizableModel class

removed print statement

4b6e1ce

fix control structure in torch_function. layer outputs not properly h…

c08385c

…andled

add testing script for gptq Multitensor

3e6a456

testing

cbdaec5

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

more fixes

67bc6ac

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

more fixes

8615084

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

this is working now

5644d02

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

removing some debug code

2e75905

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

AQT fixes

45f8cb5

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

small speed improvement, force unpad unelss we designated op as in place

49526fc

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

testing

b84139d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

final fixes

6e57d80

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

refactor test_gptq_quantizer_int4_weight_only to use new MT class. ad…

8649eb4

…d unit tests for MT. Refactor inplace ops classification for readablility.

change path to model.pth

1f2498d

add correct imports to tests

8819cc5

fix imports and duplicate lines of code

2d36bb6

fix import of model.pth

200473b

add decorator to skip test due to lm_eval not in CI

826b830

HDCharles force-pushed the gptq_MT_refactor branch from 6c532db to 826b830 Compare October 1, 2024 16:12

danielpatrickhug and others added 2 commits October 1, 2024 13:04

Merge branch 'pytorch:main' into gptq_MT_refactor

2ed51c5

skip MT tests in test_quant_api

83d8cdd

HDCharles merged commit 83d5b63 into pytorch:main Oct 1, 2024
17 checks passed

vkuzo mentioned this pull request Oct 2, 2024

running float8 tests fails on main due to unrelated error #991

Closed

danielpatrickhug mentioned this pull request Oct 18, 2024

Remove old gptq #1115

Open

jerryzh168 mentioned this pull request Oct 31, 2024

Call narrow only for TensorCoreTiledLayout #1207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gptq mt refactor #914

Gptq mt refactor #914

danielpatrickhug commented Sep 21, 2024

pytorch-bot bot commented Sep 21, 2024 •

edited

Loading

HDCharles left a comment

HDCharles Sep 26, 2024

msaroufim commented Sep 27, 2024

HDCharles Sep 28, 2024

HDCharles commented Sep 29, 2024

Gptq mt refactor #914

Gptq mt refactor #914

Conversation

danielpatrickhug commented Sep 21, 2024

pytorch-bot bot commented Sep 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/914

✅ No Failures

HDCharles left a comment

Choose a reason for hiding this comment

HDCharles Sep 26, 2024

Choose a reason for hiding this comment

msaroufim commented Sep 27, 2024

HDCharles Sep 28, 2024

Choose a reason for hiding this comment

HDCharles commented Sep 29, 2024

pytorch-bot bot commented Sep 21, 2024 •

edited

Loading