-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batched/dense: Reduce register pressure #1588
Conversation
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930 # 154 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight # 162 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC1020 # 115 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC1020_Light_LayoutRight # 114 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC1020 # 77 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL19 # 164 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001 # 213 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG13CUDA10 # 99 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020 # 103 (click to expand)
|
template <> | ||
constexpr KOKKOS_INLINE_FUNCTION int | ||
kk_gemm_dlb_buf_tile_k<Kokkos::Experimental::HIP>() { | ||
return 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e10harvey I don't really follow. Please remind me. Why can tile_k=16 for HIP reduce the register allocation as compared to tile_k=8? Isn't that with tile_k=16 we have larger tile which needs more register? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tile_k = 16
results in larger shared memory allocations but smaller REG_M and REG_N sizes:
kokkos-kernels/batched/dense/impl/KokkosBatched_Gemm_DblBuf_Impl.hpp
Lines 385 to 386 in f3556c5
view_type_2d_scratch svA_scr(member.team_scratch(0), TILE_M, TILE_K); | |
view_type_2d_scratch svB_scr(member.team_scratch(0), TILE_K, TILE_N); |
kokkos-kernels/batched/dense/impl/KokkosBatched_Gemm_DblBuf_Impl.hpp
Lines 146 to 147 in f3556c5
constexpr int reg_m = TILE_M / TILE_K; | |
constexpr int reg_n = TILE_N / TILE_K + 2 * !!(TILE_N % TILE_K); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @e10harvey. Now I remember. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @e10harvey. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me and yeah register pressure on MI100 is a known issue... thanks for fixing this!
@lucbv: Ah, is there somewhere I can read up on this? |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ vqd8a lucbv ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
@e10harvey re-launching the github actions so we can have all the checks passing. |
Candidate cherry-pick for release-candidate-3.7.01 |
batched/dense: Reduce register pressure (cherry picked from commit a131d8b)
Cherry-picked to release-candidate-3.7.01 SHA 9b3b045 |
Fixes #1547.