Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gptq mt refactor #914

Merged
merged 24 commits into from
Oct 1, 2024
Merged

Gptq mt refactor #914

merged 24 commits into from
Oct 1, 2024

Conversation

danielpatrickhug
Copy link
Contributor

This is a PR for the ongoing GPTQ MultiTensor refactor.
I added the e2e test and some atomic unit tests for the MultiTensor Class and the Int4WeightOnlyGPTQQuantizer class. I also refactored the non_in_place logic for readability and sped up the quantization. Finally, I added a new inputRecoder that utilized Muti Tensor and not MultiInput. @HDCharles

Copy link

pytorch-bot bot commented Sep 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/914

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 83d8cdd with merge base 1137f39 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 21, 2024
Copy link
Contributor

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the multitensor probably need a bit more though its not incredible urgent

I would check that

A) MultiTensor.pad, the tensor id's differ post padding
B) if you do an in place op with multiple inputs you get the expected (multiple) changes [this was the issue before, you'd only get the change from whichever ran list so testing this would be important]
C) if you do a not in place op, like functional.linear(MultiTensor(a1, a2), MultiTensorWeight(w1)) the final state of the weight is MultiTensorWeight(w1) rather than MultiTensorWeight(w1, w2) like it would be for an in place op.

precision = torch.bfloat16
device = "cuda"
checkpoint_path = Path("../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth")
checkpoint_path = Path("checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note by default tests are run from the tests dir and normally checkpoints are in the root of torchao after prepare.sh so this should probably be the previous path

@msaroufim
Copy link
Member

Haven't followed up here, is this good to merge?

@@ -338,10 +338,10 @@ def test_8da4w_quantizer_eval(self):
f"accuracy regressed from 8.23 to {result['results']['wikitext']['word_perplexity,none']}"
)

@unittest.skip("skipping until we get checkpoints for gpt-fast")
# @unittest.skip("skipping until we get checkpoints for gpt-fast")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i would add this back in, i removed it for my own manual testing but CI doesn't have lm_eval or the llama checkpoints so we run these manually.

@HDCharles
Copy link
Contributor

Haven't followed up here, is this good to merge?

i've been talking to AIHD on discord mostly, i think we're trying to pass CI and then it'll be ready to merge

danielpatrickhug and others added 22 commits October 1, 2024 09:12
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

this is now working

2024-09-12:00:11:50,735 INFO     [test_gptq_mt.py:259] wikitext: {'word_perplexity,none': 7.993930171683578, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4931942138977463, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.578401823345637, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
…d unit tests for MT. Refactor inplace ops classification for readablility.
@HDCharles HDCharles merged commit 83d5b63 into pytorch:main Oct 1, 2024
17 checks passed
melvinebenezer pushed a commit to melvinebenezer/ao that referenced this pull request Oct 7, 2024
* add inital [WIP] of MultiTensor rewrite of GPTQ

* Implement GPTQQuantizer and Int4WeightOnlyGPTQQuantizer

* added GPTQuantizer and Int4WeightOnlyGPTQQuantizer classes. Removed QuantizableModel class

* removed print statement

* fix control structure in torch_function. layer outputs not properly handled

* add testing script for gptq Multitensor

* testing

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* more fixes

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* more fixes

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* this is working now

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixes for in place ops

Summary:

this is now working

2024-09-12:00:11:50,735 INFO     [test_gptq_mt.py:259] wikitext: {'word_perplexity,none': 7.993930171683578, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4931942138977463, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.578401823345637, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* removing some debug code

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* AQT fixes

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* small speed improvement, force unpad unelss we designated op as in place

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* testing

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* final fixes

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* refactor test_gptq_quantizer_int4_weight_only to use new MT class. add unit tests for MT. Refactor inplace ops classification for readablility.

* change path to model.pth

* add correct imports to tests

* fix imports and duplicate lines of code

* fix import of model.pth

* add decorator to skip test due to lm_eval not in CI

* skip MT tests in test_quant_api

---------

Co-authored-by: HDCharles <charlesdavidhernandez@gmail.com>
jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Oct 31, 2024
Summary:
att, previously in pytorch#914 we added narrow op for all layout,
the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for
executorch, this PR guarded narrow op for tensor core tiled layout only

If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl

Test Plan:
python test/test_integration.py -k test_export

Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Nov 6, 2024
Summary:
att, previously in pytorch#914 we added narrow op for all layout,
the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for
executorch, this PR guarded narrow op for tensor core tiled layout only

If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl

Test Plan:
python test/test_integration.py -k test_export

Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Nov 9, 2024
Summary:
att, previously in pytorch#914 we added narrow op for all layout,
the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for
executorch, this PR guarded narrow op for tensor core tiled layout only

If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl

Test Plan:
python test/test_integration.py -k test_export

Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit that referenced this pull request Nov 12, 2024
* Call narrow only for TensorCoreTiledLayout only

Summary:
att, previously in #914 we added narrow op for all layout,
the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for
executorch, this PR guarded narrow op for tensor core tiled layout only

If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl

Test Plan:
python test/test_integration.py -k test_export

Reviewers:

Subscribers:

Tasks:

Tags:

* enable test

* version

* skip aoti

* version update

* skip aoti
sunjiweiswift pushed a commit to sunjiweiswift/ao that referenced this pull request Nov 25, 2024
* Call narrow only for TensorCoreTiledLayout only

Summary:
att, previously in pytorch#914 we added narrow op for all layout,
the introduced narrow op breaks the pattern for int8 dynamic activation int4 weight quant for
executorch, this PR guarded narrow op for tensor core tiled layout only

If similar things coming up in the future we can factor this into a proper API for Layout or TensorImpl

Test Plan:
python test/test_integration.py -k test_export

Reviewers:

Subscribers:

Tasks:

Tags:

* enable test

* version

* skip aoti

* version update

* skip aoti
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants