Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add F8_16x16x32_F32 support for MFMA #17792

Merged
merged 14 commits into from
Jul 16, 2024
Merged

Conversation

raikonenfnu
Copy link
Collaborator

No description provided.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add some e2e tests?

@raikonenfnu
Copy link
Collaborator Author

Should we also add some e2e tests?

Does the e2e test run on MI300 as well :o

@kuhar
Copy link
Member

kuhar commented Jul 2, 2024

Not in CI AFAIK but we can run it locally

@raikonenfnu
Copy link
Collaborator Author

Not in CI AFAIK but we can run it locally

Makes sense, added it :)

tests/e2e/matmul/generate_e2e_matmul_tests.py Outdated Show resolved Hide resolved
Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah one more thing: don't we have to enable this e2e test in cmake? Have you checked that it actually runs?

@raikonenfnu
Copy link
Collaborator Author

Ah one more thing: don't we have to enable this e2e test in cmake? Have you checked that it actually runs?

Oh LOL, I ran the script and CLI compile 😆

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about the intrinsic name just to make the e2e generator work, but I think it actually makes sense. Future hardware may implement mfma over other fp8 types so we might as well be explicit here.

@kuhar kuhar self-requested a review July 2, 2024 21:37
@raikonenfnu
Copy link
Collaborator Author

I wasn't sure about the intrinsic name just to make the e2e generator work, but I think it actually makes sense. Future hardware may implement mfma over other fp8 types so we might as well be explicit here.

Yeah that's what I was thinking as well, since even on IREE there is 2 different FP8 type implemented. :)

Comment on lines 2225 to 2337
elseif(IREE_HIP_TEST_TARGET_CHIP MATCHES "^gfx94")

# I8 Intrinsics has different layout on CDNA3/gfx94x,
# and only CDNA3/gfx94x has F8 intrinsics.

iree_generated_e2e_runner_test(
NAME
e2e_matmul_rocm_f8_large_cdna3_mfma
TEST_TYPE
matmul
GENERATOR
"generate_e2e_matmul_tests.py"
GENERATOR_ARGS
"--lhs_rhs_type=f8E4M3FNUZ"
"--acc_type=f32"
"--shapes=gpu_large_aligned"
"--compilation_info=LLVMGPUVectorDistributeMFMA"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these intrinsics and the compilation info be inferred from the target chip? I'd like to avoid more branches in test files.

At the very least this should have an explanation for why a new branch is being added in the PR description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these intrinsics and the compilation info be inferred from the target chip? I'd like to avoid more branches in test files.

Yes, but I am not sure how to use that information to stop runner without those features to compile it. Since if it tries to compile with that intrinsic, it will have compilation failure.

At the very least this should have an explanation for why a new branch is being added in the PR description.

The main reasons are:

  1. Compilation with target chip that do not have the FP8 intrinsics would fail in iree-compile.
  2. I8 Intrinsics has different layout on CDNA3/gfx94x, and only CDNA3/gfx94x has F8 intrinsics.

I can edit the PR to have these if you think it suffice :)

@jpienaar jpienaar added the benchmarks:android-cpu Run default Android CPU benchmarks label Jul 10, 2024
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
@raikonenfnu
Copy link
Collaborator Author

raikonenfnu commented Jul 16, 2024

@kuhar I followed your offline advice and refactored the test to do truncf f32 -> f8 then compute MMA and then do reference checks. It's working much better now, please review that bit again/LMK if you think this is more reasonable now, thanks! :)

@ScottTodd ScottTodd removed the benchmarks:android-cpu Run default Android CPU benchmarks label Jul 16, 2024
@raikonenfnu raikonenfnu merged commit 6a82eb5 into iree-org:main Jul 16, 2024
56 of 60 checks passed
raikonenfnu added a commit to raikonenfnu/iree that referenced this pull request Jul 23, 2024
Added F8_16x16x32xF32 MFMA layout support and their e2e tests. 
Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.
raikonenfnu added a commit that referenced this pull request Jul 23, 2024
Added F8_16x16x32xF32 MFMA layout support and their e2e tests. 
Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.
Groverkss pushed a commit that referenced this pull request Jul 24, 2024
Added F8_16x16x32xF32 MFMA layout support and their e2e tests.
Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
Added F8_16x16x32xF32 MFMA layout support and their e2e tests.
Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.

Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants