README and benchmark improvements #867

HDCharles · 2024-09-10T23:17:37Z

Summary:

quantization README:

added fp6 to benchmarks
rewrote autoquant section to give a higher level explanation before
diving into the details
reordered affine quantization section to first show techniques then
dive into details
added fp6 section
moved kv cache stuff to new section
added sparse-marlin section and removed sparse-marlin benchmark from
top of README since we don't have a reasonable flow for users to use
to apply it to their model without a pre-sparsified checkpoint.
added uintx section

Benchmarks Changes:

added instructions for adding things to benchmarks so everything
stays consistent (in llama benchmark README)
organized/ran benchmarks for uintx and fp6 and sparse-marlin 3) added evaluations.sh to mirror benchmarks.sh
added sparse-marlin to eval.py
fixed some generate.py logging bugs
improved generate help quantization help text
fixed some eval.py bugs with uintx
added marlin to eval
fixed eval help text

sparsity readme:

added some details to sparsity

Test Plan:

benchmarks.sh
evaluations.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: quantization README: 1) added fp6 to benchmarks 2) rewrote autoquant section to give a higher level explanation before diving into the details 3) reordered affine quantization section to first show techniques then dive into details 4) added fp6 section 5) moved kv cache stuff to new section 6) added sparse-marlin section and removed sparse-marlin benchmark from top of README since we don't have a reasonable flow for users to use to apply it to their model without a pre-sparsified checkpoint. 7) added uintx section Benchmarks Changes: 1) added instructions for adding things to benchmarks so everything stays consistent (in llama benchmark README) 2) organized/ran benchmarks for uintx and fp6 and sparse-marlin 3) added evaluations.sh to mirror benchmarks.sh 4) added sparse-marlin to eval.py 5) fixed some generate.py logging bugs 6) improved generate help quantization help text 7) fixed some eval.py bugs with uintx 8) added marlin to eval 9) fixed eval help text sparsity readme: 1) added some details to sparsity Test Plan: benchmarks.sh evaluations.sh Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-09-10T23:17:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/867

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e6db619 with merge base e283743 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/sparsity/README.md

jerryzh168

LGTM, thanks for making all the benchmark/eval up to date!

jerryzh168 · 2024-09-10T23:33:45Z

torchao/_models/llama/benchmark_results.txt

@@ -1,28 +1,23 @@
+README BENCHMARKS


btw, is this file manually organized?

kind of, i organized it manually based on where the stuff would fall if you ran the entire benchmark suite, but since i ran them in different batches, they didn't come out all in the right order. And i added a few comments to make it easier to parse

jerryzh168 · 2024-09-10T23:35:26Z

torchao/_models/llama/benchmarks.sh

@@ -1,44 +1,28 @@
 export CHECKPOINT_PATH=../../../checkpoints # path to checkpoints folder


I think one improvement for this script might be to use a loop to only specify quantization or whatever arg that changes, instead of spelling out all the args, it might be easier to see what benchmarks we are running

that would be good if we were productionizing things but most of the time i'm commenting things out and only running 1 or 2 examples and having them all spelt out is really nice

jcaip

lgtm

README improvements Summary: quantization README: 1) added fp6 to benchmarks 2) rewrote autoquant section to give a higher level explanation before diving into the details 3) reordered affine quantization section to first show techniques then dive into details 4) added fp6 section 5) moved kv cache stuff to new section 6) added sparse-marlin section and removed sparse-marlin benchmark from top of README since we don't have a reasonable flow for users to use to apply it to their model without a pre-sparsified checkpoint. 7) added uintx section Benchmarks Changes: 1) added instructions for adding things to benchmarks so everything stays consistent (in llama benchmark README) 2) organized/ran benchmarks for uintx and fp6 and sparse-marlin 3) added evaluations.sh to mirror benchmarks.sh 4) added sparse-marlin to eval.py 5) fixed some generate.py logging bugs 6) improved generate help quantization help text 7) fixed some eval.py bugs with uintx 8) added marlin to eval 9) fixed eval help text sparsity readme: 1) added some details to sparsity Test Plan: benchmarks.sh evaluations.sh Reviewers: Subscribers: Tasks: Tags:

…h#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Removing all references to HQQ * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Removing GPTQ from all of torchchat * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Rebase + Add back accidental deletion * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Removing all references to HQQ (pytorch#869) * Removing all references to HQQ * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

HDCharles requested review from msaroufim and jerryzh168 September 10, 2024 23:17

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024

HDCharles requested a review from jcaip September 10, 2024 23:17

jerryzh168 reviewed Sep 10, 2024

View reviewed changes

torchao/sparsity/README.md Show resolved Hide resolved

jerryzh168 approved these changes Sep 10, 2024

View reviewed changes

jerryzh168 reviewed Sep 10, 2024

View reviewed changes

jcaip approved these changes Sep 10, 2024

View reviewed changes

HDCharles added the benchmark label Sep 11, 2024

HDCharles merged commit b4d0768 into main Sep 11, 2024
17 checks passed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Pinning numpy to under 2.0 (pytorch#867)

b31c11c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README and benchmark improvements #867

README and benchmark improvements #867

HDCharles commented Sep 10, 2024

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading

jerryzh168 left a comment

jerryzh168 Sep 10, 2024

HDCharles Sep 11, 2024

jerryzh168 Sep 10, 2024

HDCharles Sep 11, 2024

jcaip left a comment

		@@ -1,44 +1,28 @@
		export CHECKPOINT_PATH=../../../checkpoints # path to checkpoints folder

README and benchmark improvements #867

README and benchmark improvements #867

Conversation

HDCharles commented Sep 10, 2024

pytorch-bot bot commented Sep 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/867

✅ No Failures

jerryzh168 left a comment

Choose a reason for hiding this comment

jerryzh168 Sep 10, 2024

Choose a reason for hiding this comment

HDCharles Sep 11, 2024

Choose a reason for hiding this comment

jerryzh168 Sep 10, 2024

Choose a reason for hiding this comment

HDCharles Sep 11, 2024

Choose a reason for hiding this comment

jcaip left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading