Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPU][ROCm] Add I8 MFMA layout support for CDNA2 #18433

Merged
merged 3 commits into from
Sep 5, 2024

Conversation

nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Sep 4, 2024

These i8 intrinsics are supported by CDNA2 arch so adding them provides better functionality such as fixing #18406. It is worth noting that, this i8 support is emulated and the CDNA2 hardware does fp16 under the hood for these intrinsics.
With this PR we can also support: "sharktank/punet/int8" from the external test suite and hence the xfail on those tests are removed.
Fixes: #18406

@nirvedhmeshram nirvedhmeshram changed the title [WIP][HIP] Add i8 instrinsics for cdna2 [WIP][HIP] Add I8 MFMA layout support for CDNA2 Sep 4, 2024
@nirvedhmeshram
Copy link
Contributor Author

nirvedhmeshram commented Sep 4, 2024

I considered adding matmul e2e tests for this, but couldnt see a way short of writing a new generate script that will allow us to run the suite for cdna2 without having instrinic mismatch with cdna3. Given that we are testing some models e2e for cdna2, I am okay with this but just wanted to call it out in case reviewers have other idea.

@nirvedhmeshram nirvedhmeshram marked this pull request as ready for review September 4, 2024 20:52
@nirvedhmeshram nirvedhmeshram changed the title [WIP][HIP] Add I8 MFMA layout support for CDNA2 [HIP] Add I8 MFMA layout support for CDNA2 Sep 4, 2024
@nirvedhmeshram nirvedhmeshram changed the title [HIP] Add I8 MFMA layout support for CDNA2 [LLVMGPU][ROCm] Add I8 MFMA layout support for CDNA2 Sep 4, 2024
@ScottTodd
Copy link
Member

I considered adding matmul e2e tests for this, but couldnt see a way short of writing a new generate script that will allow us to run the suite for cdna2 without having instrinic mismatch with cdna3. Given that we are testing some models e2e for cdna2, I am okay with this but just wanted to call it out in case reviewers have other idea.

I'm in the process of migrating the matmul e2e tests to https://github.com/iree-org/iree-test-suites/tree/main/linalg_ops and refactoring to be decoupled from the core IREE CMake project (see iree-org/iree-test-suites#2). Next on my list is refactoring the massive list of iree_generated_e2e_runner_test test suites. Can add cdna2 support to the list.

@ScottTodd
Copy link
Member

Note that in the new location, the tests are not using iree_codegen.compilation_info at all. The compiler must determine which pipeline to use based on the target device and other user-facing flags, not explicitly added IR.

@nirvedhmeshram nirvedhmeshram merged commit f10dae2 into iree-org:main Sep 5, 2024
37 checks passed
IanWood1 pushed a commit to IanWood1/iree that referenced this pull request Sep 8, 2024
These i8 intrinsics are supported by CDNA2 arch so adding them provides
better functionality such as fixing
iree-org#18406. It is worth noting that,
this i8 support is emulated and the CDNA2 hardware does fp16 under the
hood for these intrinsics.
With this PR we can also support: "sharktank/punet/int8" from the
external test suite and hence the xfail on those tests are removed.
Fixes: iree-org#18406
josemonsalve2 pushed a commit to josemonsalve2/iree that referenced this pull request Sep 14, 2024
These i8 intrinsics are supported by CDNA2 arch so adding them provides
better functionality such as fixing
iree-org#18406. It is worth noting that,
this i8 support is emulated and the CDNA2 hardware does fp16 under the
hood for these intrinsics.
With this PR we can also support: "sharktank/punet/int8" from the
external test suite and hence the xfail on those tests are removed.
Fixes: iree-org#18406
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LLVMGPU][ROCm] SDXL int8 fails to compile on gfx90a
4 participants