Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP build with >=rocm-6.3.0 fails (composable-kernel update request) #250

Open
AngryLoki opened this issue Jan 19, 2025 · 1 comment · May be fixed by #251
Open

HIP build with >=rocm-6.3.0 fails (composable-kernel update request) #250

AngryLoki opened this issue Jan 19, 2025 · 1 comment · May be fixed by #251

Comments

@AngryLoki
Copy link

Hi, as new version of rocm components now uses newer Clang (approximately 19.1.0), oidn can not be compiled as clang-19 complains for few errors:

  1. this error was fixed in ROCm/composable_kernel@c441378
/src/oidn/devices/hip/../../external/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp:785:32: error: no member named 'a_origin' in 'BlockwiseGemmXdlops_v2<BlockSize, FloatAB, FloatAcc, ATileDesc, BTileDesc, AMmaTileDesc, BMmaTileDesc, MPerBlock, NPerBlock, KPerBlock, MPerXDL, NPerXDL, MRepeat, NRepeat, KPack, TransposeC, AMmaKStride, BMmaKStride>'
  785 |         : a_thread_copy_(other.a_origin), b_thread_copy_(other.b_origin)
      |                          ~~~~~ ^
  1. these group of errors was fixed in ROCm/composable_kernel@922e42a and other commits (i. e. for blockwise_gemm_xdlops.hpp code was refactored)
/src/oidn/devices/hip/../../external/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp:957:42: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  957 |                     xdlops_gemm.template Run(
      |                                          ^
19 warnings and 5 errors generated when compiling for gfx1030

Both of these issues were fixed in recent releases (e. g. 6.3.0).

As API for DeviceGroupedConvFwdMultipleD_Wmma_CShuffle in rocm-6.3.x I also noticed that changing configuration to another one improves performance of oidnBenchmark by 15% (while running on 7900XTX, gfx1100). I'll provide a pull-request with my results, thanks!

AngryLoki added a commit to AngryLoki/oidn that referenced this issue Jan 19, 2025
This fixes compilation with recent versions of Clang (Clang 19 specifically).

Additionally, as `DeviceGroupedConvFwdMultipleD_Wmma_CShuffle` API was changed, new wmma configuration provides 15% better performance on 7900XTX GPU (gfx1100).

Closes RenderKit#250

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
@AngryLoki AngryLoki linked a pull request Jan 19, 2025 that will close this issue
@atafra
Copy link
Collaborator

atafra commented Jan 20, 2025

Thanks a lot, I'll take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants