You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/src/oidn/devices/hip/../../external/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp:785:32: error: no member named 'a_origin' in 'BlockwiseGemmXdlops_v2<BlockSize, FloatAB, FloatAcc, ATileDesc, BTileDesc, AMmaTileDesc, BMmaTileDesc, MPerBlock, NPerBlock, KPerBlock, MPerXDL, NPerXDL, MRepeat, NRepeat, KPack, TransposeC, AMmaKStride, BMmaKStride>'
785 | : a_thread_copy_(other.a_origin), b_thread_copy_(other.b_origin)
| ~~~~~ ^
these group of errors was fixed in ROCm/composable_kernel@922e42a and other commits (i. e. for blockwise_gemm_xdlops.hpp code was refactored)
/src/oidn/devices/hip/../../external/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp:957:42: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
957 | xdlops_gemm.template Run(
| ^
19 warnings and 5 errors generated when compiling for gfx1030
Both of these issues were fixed in recent releases (e. g. 6.3.0).
As API for DeviceGroupedConvFwdMultipleD_Wmma_CShuffle in rocm-6.3.x I also noticed that changing configuration to another one improves performance of oidnBenchmark by 15% (while running on 7900XTX, gfx1100). I'll provide a pull-request with my results, thanks!
The text was updated successfully, but these errors were encountered:
This fixes compilation with recent versions of Clang (Clang 19 specifically).
Additionally, as `DeviceGroupedConvFwdMultipleD_Wmma_CShuffle` API was changed, new wmma configuration provides 15% better performance on 7900XTX GPU (gfx1100).
ClosesRenderKit#250
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Hi, as new version of rocm components now uses newer Clang (approximately 19.1.0), oidn can not be compiled as clang-19 complains for few errors:
blockwise_gemm_xdlops.hpp
code was refactored)Both of these issues were fixed in recent releases (e. g. 6.3.0).
As API for
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
in rocm-6.3.x I also noticed that changing configuration to another one improves performance of oidnBenchmark by 15% (while running on 7900XTX, gfx1100). I'll provide a pull-request with my results, thanks!The text was updated successfully, but these errors were encountered: