-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add F8_16x16x32_F32 support for MFMA #17792
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add some e2e tests?
Does the e2e test run on MI300 as well :o |
Not in CI AFAIK but we can run it locally |
Makes sense, added it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah one more thing: don't we have to enable this e2e test in cmake? Have you checked that it actually runs?
Oh LOL, I ran the script and CLI compile 😆 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about the intrinsic name just to make the e2e generator work, but I think it actually makes sense. Future hardware may implement mfma over other fp8 types so we might as well be explicit here.
Yeah that's what I was thinking as well, since even on IREE there is 2 different FP8 type implemented. :) |
tests/e2e/matmul/CMakeLists.txt
Outdated
elseif(IREE_HIP_TEST_TARGET_CHIP MATCHES "^gfx94") | ||
|
||
# I8 Intrinsics has different layout on CDNA3/gfx94x, | ||
# and only CDNA3/gfx94x has F8 intrinsics. | ||
|
||
iree_generated_e2e_runner_test( | ||
NAME | ||
e2e_matmul_rocm_f8_large_cdna3_mfma | ||
TEST_TYPE | ||
matmul | ||
GENERATOR | ||
"generate_e2e_matmul_tests.py" | ||
GENERATOR_ARGS | ||
"--lhs_rhs_type=f8E4M3FNUZ" | ||
"--acc_type=f32" | ||
"--shapes=gpu_large_aligned" | ||
"--compilation_info=LLVMGPUVectorDistributeMFMA" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these intrinsics and the compilation info be inferred from the target chip? I'd like to avoid more branches in test files.
At the very least this should have an explanation for why a new branch is being added in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these intrinsics and the compilation info be inferred from the target chip? I'd like to avoid more branches in test files.
Yes, but I am not sure how to use that information to stop runner without those features to compile it. Since if it tries to compile with that intrinsic, it will have compilation failure.
At the very least this should have an explanation for why a new branch is being added in the PR description.
The main reasons are:
- Compilation with target chip that do not have the FP8 intrinsics would fail in iree-compile.
- I8 Intrinsics has different layout on CDNA3/gfx94x, and only CDNA3/gfx94x has F8 intrinsics.
I can edit the PR to have these if you think it suffice :)
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
@kuhar I followed your offline advice and refactored the test to do truncf f32 -> f8 then compute MMA and then do reference checks. It's working much better now, please review that bit again/LMK if you think this is more reasonable now, thanks! :) |
Added F8_16x16x32xF32 MFMA layout support and their e2e tests. Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.
Added F8_16x16x32xF32 MFMA layout support and their e2e tests. Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.
Added F8_16x16x32xF32 MFMA layout support and their e2e tests. Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.
Added F8_16x16x32xF32 MFMA layout support and their e2e tests. Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x. Signed-off-by: Lubo Litchev <lubol@google.com>
No description provided.