Replace repeat kv with proper GQA handling. #171

wang2yn84 · 2024-08-19T21:32:19Z

Repeat kv in the original llama model will copy the data in some cases. Replace it with reshaping the number of heads dimension in the query to the number of tokens dimension (-2).

…attention kernel.

FanhaiLu1

Smart change! So the q * k^t = [hkv, rep * seq_len, seq_len] and q * k *v = [hkv, rep * seq_len, d], you reshape the output to: [h, seq_len, d] in the end.

lsy323 · 2024-08-20T16:27:18Z

LGTM! Thank you for the change. Seems linter needs to be fixed

wang2yn84 · 2024-08-20T17:46:02Z

Smart change! So the q * k^t = [hkv, rep * seq_len, seq_len] and q * k *v = [hkv, rep * seq_len, d], you reshape the output to: [h, seq_len, d] in the end.

Correct. The reshape doesn't affect the result.

wang2yn84 · 2024-08-20T17:53:47Z

LGTM! Thank you for the change. Seems linter needs to be fixed
Yup, fixed!

wang2yn84 added 3 commits August 19, 2024 19:48

Replace repeat kv with query reshaping for flash attention and dense …

7afcb01

…attention kernel.

Replaces the repeat kv for dense attention and flash attention kernel.

38579f8

Remove temp test fils.

ea6caca

wang2yn84 requested review from qihqi, FanhaiLu1 and lsy323 August 19, 2024 21:32

Fix lint issue.

0495312

qihqi approved these changes Aug 20, 2024

View reviewed changes

FanhaiLu1 approved these changes Aug 20, 2024

View reviewed changes

lsy323 approved these changes Aug 20, 2024

View reviewed changes

Fix lint errors.

e1ada24

wang2yn84 merged commit 7092cc5 into main Aug 20, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace repeat kv with proper GQA handling. #171

Replace repeat kv with proper GQA handling. #171

wang2yn84 commented Aug 19, 2024

FanhaiLu1 left a comment

lsy323 commented Aug 20, 2024 •

edited

Loading

wang2yn84 commented Aug 20, 2024

wang2yn84 commented Aug 20, 2024

Replace repeat kv with proper GQA handling. #171

Replace repeat kv with proper GQA handling. #171

Conversation

wang2yn84 commented Aug 19, 2024

FanhaiLu1 left a comment

Choose a reason for hiding this comment

lsy323 commented Aug 20, 2024 • edited Loading

wang2yn84 commented Aug 20, 2024

wang2yn84 commented Aug 20, 2024

lsy323 commented Aug 20, 2024 •

edited

Loading