-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Add window attention for gemma-2 #1056
Conversation
@Ying1123 We can temporarily change |
7ac08ad
to
0ad6781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried out the engine with the gemma-2-2b, 9b, and 27b models, and looks good. Left some quick comments. Overall lgtm!
06d3450
to
f0f9941
Compare
224b293
to
749a8ff
Compare
self.disable_radix_cache = True | ||
self.disable_regex_jump_forward = True | ||
self.disable_flashinfer = False | ||
self.disable_cuda_graph = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda graph should be turned on
Additional comments addressed in #1090 |
DO NOT turn on auto-merge.
I'll merge it manually.