Add F8_16x16x32_F32 support for MFMA #17792

raikonenfnu · 2024-07-02T16:31:06Z

No description provided.

kuhar

Should we also add some e2e tests?

raikonenfnu · 2024-07-02T17:01:46Z

Should we also add some e2e tests?

Does the e2e test run on MI300 as well :o

kuhar · 2024-07-02T17:05:43Z

Not in CI AFAIK but we can run it locally

raikonenfnu · 2024-07-02T17:33:28Z

Not in CI AFAIK but we can run it locally

Makes sense, added it :)

tests/e2e/matmul/generate_e2e_matmul_tests.py

kuhar

Ah one more thing: don't we have to enable this e2e test in cmake? Have you checked that it actually runs?

raikonenfnu · 2024-07-02T18:18:31Z

Ah one more thing: don't we have to enable this e2e test in cmake? Have you checked that it actually runs?

Oh LOL, I ran the script and CLI compile 😆

kuhar

I wasn't sure about the intrinsic name just to make the e2e generator work, but I think it actually makes sense. Future hardware may implement mfma over other fp8 types so we might as well be explicit here.

raikonenfnu · 2024-07-02T23:53:41Z

I wasn't sure about the intrinsic name just to make the e2e generator work, but I think it actually makes sense. Future hardware may implement mfma over other fp8 types so we might as well be explicit here.

Yeah that's what I was thinking as well, since even on IREE there is 2 different FP8 type implemented. :)

ScottTodd · 2024-07-03T15:51:19Z

tests/e2e/matmul/CMakeLists.txt

+elseif(IREE_HIP_TEST_TARGET_CHIP MATCHES "^gfx94")
+
+# I8 Intrinsics has different layout on CDNA3/gfx94x,
+# and only CDNA3/gfx94x has F8 intrinsics.
+
+iree_generated_e2e_runner_test(
+  NAME
+    e2e_matmul_rocm_f8_large_cdna3_mfma
+  TEST_TYPE
+    matmul
+  GENERATOR
+    "generate_e2e_matmul_tests.py"
+  GENERATOR_ARGS
+    "--lhs_rhs_type=f8E4M3FNUZ"
+    "--acc_type=f32"
+    "--shapes=gpu_large_aligned"
+    "--compilation_info=LLVMGPUVectorDistributeMFMA"


Can these intrinsics and the compilation info be inferred from the target chip? I'd like to avoid more branches in test files.

At the very least this should have an explanation for why a new branch is being added in the PR description.

Can these intrinsics and the compilation info be inferred from the target chip? I'd like to avoid more branches in test files.

Yes, but I am not sure how to use that information to stop runner without those features to compile it. Since if it tries to compile with that intrinsic, it will have compilation failure.

At the very least this should have an explanation for why a new branch is being added in the PR description.

The main reasons are:

Compilation with target chip that do not have the FP8 intrinsics would fail in iree-compile.

I8 Intrinsics has different layout on CDNA3/gfx94x, and only CDNA3/gfx94x has F8 intrinsics.

I can edit the PR to have these if you think it suffice :)

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

raikonenfnu · 2024-07-16T08:50:21Z

@kuhar I followed your offline advice and refactored the test to do truncf f32 -> f8 then compute MMA and then do reference checks. It's working much better now, please review that bit again/LMK if you think this is more reasonable now, thanks! :)

Added F8_16x16x32xF32 MFMA layout support and their e2e tests. Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x.

Added F8_16x16x32xF32 MFMA layout support and their e2e tests. Needed to adjust/branch in the e2e matmul test's cmake because only gfx94x GPUs have FP8 MFMA layouts and it has different I8 intrinsic shape/layout as opposed to what is present in gfx90x. Signed-off-by: Lubo Litchev <lubol@google.com>

raikonenfnu requested review from kuhar, MaheshRavishankar, qedawkins, Groverkss, antiagainst and benvanik as code owners July 2, 2024 16:31

kuhar reviewed Jul 2, 2024

View reviewed changes

kuhar approved these changes Jul 2, 2024

View reviewed changes

tests/e2e/matmul/generate_e2e_matmul_tests.py Outdated Show resolved Hide resolved

kuhar reviewed Jul 2, 2024

View reviewed changes

kuhar approved these changes Jul 2, 2024

View reviewed changes

kuhar self-requested a review July 2, 2024 21:37

ScottTodd reviewed Jul 3, 2024

View reviewed changes

jpienaar added the benchmarks:android-cpu Run default Android CPU benchmarks label Jul 10, 2024

raikonenfnu added 12 commits July 15, 2024 17:11

Add F8_16x16x32_F32 support for MFMA

9575255

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Add e2e test

b2bf291

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

add cmake test

0949d80

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Specify what type of F8

cee84f2

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

NIT: Use 1 region for same intrinsic shape

3e0bdb1

Add fp8 as part of HAL buffer view element_Types

6c0cfa5

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Add fp8 hal iree_test_utils_write_element

2c217df

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

FP8 reference matmul

fab3bdd

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

FP8 reference matmul fix

a329ff8

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Handle read F8 elements

1a0d96a

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

more fp8 utility

89c40ad

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

minor fp8 util fix

36ca8ea

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Selective GPU test

b80417f

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

raikonenfnu force-pushed the fp8_mfma branch from 759171e to b80417f Compare July 15, 2024 22:15

Modify test to use truncf instead of plumbing in HAL FP8

43d4e29

ScottTodd removed the benchmarks:android-cpu Run default Android CPU benchmarks label Jul 16, 2024

raikonenfnu requested a review from ScottTodd July 16, 2024 16:43

kuhar approved these changes Jul 16, 2024

View reviewed changes

raikonenfnu merged commit 6a82eb5 into iree-org:main Jul 16, 2024
56 of 60 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add F8_16x16x32_F32 support for MFMA #17792

Add F8_16x16x32_F32 support for MFMA #17792

raikonenfnu commented Jul 2, 2024

kuhar left a comment

raikonenfnu commented Jul 2, 2024

kuhar commented Jul 2, 2024

raikonenfnu commented Jul 2, 2024

kuhar left a comment

raikonenfnu commented Jul 2, 2024

kuhar left a comment

raikonenfnu commented Jul 2, 2024

ScottTodd Jul 3, 2024

raikonenfnu Jul 16, 2024

raikonenfnu commented Jul 16, 2024 •

edited

Loading

Add F8_16x16x32_F32 support for MFMA #17792

Add F8_16x16x32_F32 support for MFMA #17792

Conversation

raikonenfnu commented Jul 2, 2024

kuhar left a comment

Choose a reason for hiding this comment

raikonenfnu commented Jul 2, 2024

kuhar commented Jul 2, 2024

raikonenfnu commented Jul 2, 2024

kuhar left a comment

Choose a reason for hiding this comment

raikonenfnu commented Jul 2, 2024

kuhar left a comment

Choose a reason for hiding this comment

raikonenfnu commented Jul 2, 2024

ScottTodd Jul 3, 2024

Choose a reason for hiding this comment

raikonenfnu Jul 16, 2024

Choose a reason for hiding this comment

raikonenfnu commented Jul 16, 2024 • edited Loading

raikonenfnu commented Jul 16, 2024 •

edited

Loading