Enable gemv schedule for adreno #16932

krishnaraj36 · 2024-04-26T05:56:58Z

Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format.

Few LLM models Decode performance on Snapdragon Gen-3 android.

Models Baseline Latest improved

Llama-2-7B 10 tok/sec 12.5 tok/sec
Qwen-7b 8.5 tok/sec 11 tok/sec

Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format. Few LLM models Decode performance on Snapdragon Gen-3 android. Models Baseline Latest improved Llama-2-7B 10 tok/sec 12.5 tok/sec Qwen-7b 8.5 tok/sec 11 tok/sec

krishnaraj36 · 2024-04-26T05:58:28Z

@srkreddy1238 @tqchen : Can you please take a look in this PR

Hzfengsy · 2024-04-27T07:36:03Z

Thanks @krishnaraj36 for the great PR and significant perf improvement.

However, q4f16_0 should be outer_reduction as the layout is KN. I wonder why the rule is named as sch_adreno_inner_reduction

If it's a naming issue, we can replace the current rule of sch_outer_reduction as it is specially designed for android only

krishnaraj36 · 2024-04-29T04:23:49Z

Thanks @krishnaraj36 for the great PR and significant perf improvement.

However, q4f16_0 should be outer_reduction as the layout is KN. I wonder why the rule is named as sch_adreno_inner_reduction

If it's a naming issue, we can replace the current rule of sch_outer_reduction as it is specially designed for android only

@Hzfengsy Thanks for your review.
Yes, Its naming issue, I have made changes to func naming to make sense.

Hzfengsy

Overall LGTM

python/tvm/dlight/gpu/gemv.py

krishnaraj36 mentioned this pull request Apr 26, 2024

[DLIGHT][GEMV] Enable gemv schedule for adreno mlc-ai/relax#319

Closed

krishnaraj36 added 2 commits April 26, 2024 15:33

Modified test case according to dlight schedule update

5038596

Fix lint error

b8534a8

Hzfengsy self-assigned this Apr 27, 2024

Updated naming of schedule func

63d3a67

fixed lint error

bddc0dc

Hzfengsy approved these changes Apr 29, 2024

View reviewed changes

python/tvm/dlight/gpu/gemv.py Outdated Show resolved Hide resolved

python/tvm/dlight/gpu/gemv.py Outdated Show resolved Hide resolved

python/tvm/dlight/gpu/gemv.py Show resolved Hide resolved

krishnaraj36 added 2 commits April 29, 2024 11:08

Corrected comments

3392a16

Update gemv.py

a89de12

tqchen merged commit b4a69de into apache:main Apr 29, 2024
18 checks passed

tqchen mentioned this pull request May 4, 2024

AutoTVM optimization? mlc-ai/mlc-llm#2244

Closed

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable gemv schedule for adreno #16932

Enable gemv schedule for adreno #16932

krishnaraj36 commented Apr 26, 2024

krishnaraj36 commented Apr 26, 2024

Hzfengsy commented Apr 27, 2024 •

edited

Loading

krishnaraj36 commented Apr 29, 2024 •

edited

Loading

Hzfengsy left a comment

Enable gemv schedule for adreno #16932

Enable gemv schedule for adreno #16932

Conversation

krishnaraj36 commented Apr 26, 2024

krishnaraj36 commented Apr 26, 2024

Hzfengsy commented Apr 27, 2024 • edited Loading

krishnaraj36 commented Apr 29, 2024 • edited Loading

Hzfengsy left a comment

Choose a reason for hiding this comment

Hzfengsy commented Apr 27, 2024 •

edited

Loading

krishnaraj36 commented Apr 29, 2024 •

edited

Loading