-
Notifications
You must be signed in to change notification settings - Fork 207
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Some improvement to make autoquant v2 work with Mixtral-8x7B-v0.1 (#1328
) * Some improvement to make autoquant v2 work with Mixtral-8x7B-v0.1 Summary: Tested locally running autoquant v2 with llama2-7b and Mixtral-8x7B-v0.1 in https://github.com/pytorch/pytorch/blob/main/benchmarks/gpt_fast/benchmark.py Llama-2-7b-chat-hf: Compilation time: 81.71 seconds Average tokens/sec: 131.12 tokens/sec Average bandwidth achieved: 1732.77 GB/s Memory used: 27.71 GB Mixtral-8x7B-v0.1: Compilation time: 108.89 seconds Average tokens/sec: 79.59 tokens/sec Average bandwidth achieved: 1025.14 GB/s Memory used: 61.62 GB more result can be found in pytorch/pytorch#140627 Test Plan: local test with pytorch/benchmarks/gpt_fast/benchmark.py Reviewers: Subscribers: Tasks: Tags: * remove print
- Loading branch information
1 parent
f3c1a00
commit 7c3c51f
Showing
1 changed file
with
35 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters