-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable gemv schedule for adreno #16932
Conversation
Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format. Few LLM models Decode performance on Snapdragon Gen-3 android. Models Baseline Latest improved Llama-2-7B 10 tok/sec 12.5 tok/sec Qwen-7b 8.5 tok/sec 11 tok/sec
@srkreddy1238 @tqchen : Can you please take a look in this PR |
Thanks @krishnaraj36 for the great PR and significant perf improvement. However, If it's a naming issue, we can replace the current rule of |
@Hzfengsy Thanks for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format.
Few LLM models Decode performance on Snapdragon Gen-3 android.
Models Baseline Latest improved
Llama-2-7B 10 tok/sec 12.5 tok/sec
Qwen-7b 8.5 tok/sec 11 tok/sec