-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gptq mt refactor #914
Gptq mt refactor #914
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/914
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 83d8cdd with merge base 1137f39 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the multitensor probably need a bit more though its not incredible urgent
I would check that
A) MultiTensor.pad, the tensor id's differ post padding
B) if you do an in place op with multiple inputs you get the expected (multiple) changes [this was the issue before, you'd only get the change from whichever ran list so testing this would be important]
C) if you do a not in place op, like functional.linear(MultiTensor(a1, a2), MultiTensorWeight(w1)) the final state of the weight is MultiTensorWeight(w1) rather than MultiTensorWeight(w1, w2) like it would be for an in place op.
test/quantization/test_quant_api.py
Outdated
precision = torch.bfloat16 | ||
device = "cuda" | ||
checkpoint_path = Path("../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth") | ||
checkpoint_path = Path("checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note by default tests are run from the tests dir and normally checkpoints are in the root of torchao after prepare.sh so this should probably be the previous path
Haven't followed up here, is this good to merge? |
test/quantization/test_quant_api.py
Outdated
@@ -338,10 +338,10 @@ def test_8da4w_quantizer_eval(self): | |||
f"accuracy regressed from 8.23 to {result['results']['wikitext']['word_perplexity,none']}" | |||
) | |||
|
|||
@unittest.skip("skipping until we get checkpoints for gpt-fast") | |||
# @unittest.skip("skipping until we get checkpoints for gpt-fast") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i would add this back in, i removed it for my own manual testing but CI doesn't have lm_eval or the llama checkpoints so we run these manually.
i've been talking to AIHD on discord mostly, i think we're trying to pass CI and then it'll be ready to merge |
…uantizableModel class
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: this is now working 2024-09-12:00:11:50,735 INFO [test_gptq_mt.py:259] wikitext: {'word_perplexity,none': 7.993930171683578, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4931942138977463, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.578401823345637, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
…d unit tests for MT. Refactor inplace ops classification for readablility.
6c532db
to
826b830
Compare
* add inital [WIP] of MultiTensor rewrite of GPTQ * Implement GPTQQuantizer and Int4WeightOnlyGPTQQuantizer * added GPTQuantizer and Int4WeightOnlyGPTQQuantizer classes. Removed QuantizableModel class * removed print statement * fix control structure in torch_function. layer outputs not properly handled * add testing script for gptq Multitensor * testing Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * more fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * more fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * this is working now Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fixes for in place ops Summary: this is now working 2024-09-12:00:11:50,735 INFO [test_gptq_mt.py:259] wikitext: {'word_perplexity,none': 7.993930171683578, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4931942138977463, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.578401823345637, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} Test Plan: Reviewers: Subscribers: Tasks: Tags: * removing some debug code Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * AQT fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * small speed improvement, force unpad unelss we designated op as in place Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * testing Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * final fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * refactor test_gptq_quantizer_int4_weight_only to use new MT class. add unit tests for MT. Refactor inplace ops classification for readablility. * change path to model.pth * add correct imports to tests * fix imports and duplicate lines of code * fix import of model.pth * add decorator to skip test due to lm_eval not in CI * skip MT tests in test_quant_api --------- Co-authored-by: HDCharles <charlesdavidhernandez@gmail.com>
Summary: att, previously in pytorch#914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags:
Summary: att, previously in pytorch#914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags:
Summary: att, previously in pytorch#914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags:
* Call narrow only for TensorCoreTiledLayout only Summary: att, previously in #914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags: * enable test * version * skip aoti * version update * skip aoti
* Call narrow only for TensorCoreTiledLayout only Summary: att, previously in pytorch#914 we added narrow op for all layout, the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for executorch, this PR guarded narrow op for tensor core tiled layout only If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl Test Plan: python test/test_integration.py -k test_export Reviewers: Subscribers: Tasks: Tags: * enable test * version * skip aoti * version update * skip aoti
This is a PR for the ongoing GPTQ MultiTensor refactor.
I added the e2e test and some atomic unit tests for the MultiTensor Class and the Int4WeightOnlyGPTQQuantizer class. I also refactored the non_in_place logic for readability and sped up the quantization. Finally, I added a new inputRecoder that utilized Muti Tensor and not MultiInput. @HDCharles