-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen] llama 8b fp8 with attention vector distribute fail #19991
Comments
Arising from this dispatch: https://gist.github.com/pashu123/e21bc74fafbc4ce3ae23b0adf3ac75b5 |
|
I think the problem is the |
did this get triaged in the codegen sync? Is someone on it? |
Ok, I looked at this, and it hit a pretty hairy unimplemented part of propagating reshapes across the mask operation. It is also true that this is adding a unit dimension here. I havent tracked it down to what that unit dimension is. It cant be the batch since the batch is not part of mask AFAIK. So some extra unit dimension being added here that I havent fully tracked down. Ill fix the core issue, but there might be some intermediate WAR that might be faster to land and unblock. |
#20014 seems to fix the compilation, but I need to see what impact it will have on test suite, etc. But at least the fix works. |
What happened?
Follow up of [ROCm][Codegen] llama 8b fp8 with attention segfault #19921
New codegen issue llama_f8_attn_bug_log_0213.txt after I rebase iree to
Steps to reproduce your issue
run the following cmd:
What component(s) does this issue relate to?
Compiler
Version information
commit 0ff26a7 (HEAD -> main, upstream/main)
Author: Prashant Kumar pk5561@gmail.com
Date: Thu Feb 13 23:26:59 2025 +0530
[Codegen] Add support to emulate unsupported float type (#19943)
Additional context
No response
The text was updated successfully, but these errors were encountered: