-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sm小于80的机器支持trt_flash_multihead_matmul_fuse_pass #56492
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -32,10 +32,12 @@ def is_program_valid(self, program_config: ProgramConfig) -> bool: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可能需要加一个ut,测试下 q/k/v 输入为两个input(非一个input, 一个weight)的情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已加
Sorry to inform you that 76d8b4f's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for const_cast
* jingdu duiqi need check * code style * code style * add weight is input * code style * windows ci * fix weight name * rerun ci * re run ci --------- Co-authored-by: yangjianfengo1 <yangjianfeng01.baidu.com>
PR types
Performance optimization
PR changes
Others
Description
Pcard-71501

sm<80的机器使用memory_efficient_attention支持trt_flash_multihead_matmul_fuse_pass,测试数据如下: