Optimize GeGLU layer in Gemma #2975

WoosukKwon · 2024-02-22T01:12:52Z

This PR optimizes the GeGLU layer in Gemma by merging the linear layers and introducing the gelu_and_mul kernel, similarly to how we optimized SwiGLU in Llama.

WoosukKwon · 2024-02-22T03:25:11Z

After this PR, maybe we can merge the Gemma model into Llama.

Yard1

LGTM, thanks!

WoosukKwon added 2 commits February 22, 2024 01:11

Merge Linear layers in GeGLU

bb1751a

Add gelu_and_mul kernel

7801623

WoosukKwon changed the title ~~Merge Linear layers in GeGLU~~ Optimize GeGLU layer in Gemma Feb 22, 2024

WoosukKwon added 4 commits February 22, 2024 01:57

Minor

c31f0f0

Fix kernel & add correctness test

0d27bce

Refactor test_activation

4aba8af

Merge branch 'main' into gemma-mlp

12a9e65

WoosukKwon requested a review from Yard1 February 22, 2024 02:31

Do exact match

11f8055

WoosukKwon requested a review from zhuohan123 February 22, 2024 03:18

Yard1 approved these changes Feb 22, 2024

View reviewed changes

WoosukKwon merged commit fd5dcc5 into main Feb 22, 2024
21 checks passed

WoosukKwon deleted the gemma-mlp branch February 22, 2024 04:17

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Optimize GeGLU layer in Gemma (vllm-project#2975)

18600ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize GeGLU layer in Gemma #2975

Optimize GeGLU layer in Gemma #2975

WoosukKwon commented Feb 22, 2024 •

edited

Loading

WoosukKwon commented Feb 22, 2024

Yard1 left a comment

Optimize GeGLU layer in Gemma #2975

Optimize GeGLU layer in Gemma #2975

Conversation

WoosukKwon commented Feb 22, 2024 • edited Loading

WoosukKwon commented Feb 22, 2024

Yard1 left a comment

Choose a reason for hiding this comment

WoosukKwon commented Feb 22, 2024 •

edited

Loading