-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve double buffering batched gemm perf for matrix sizes >64x64 #1239
Conversation
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740 to start: Total Wait = 3603
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @vqd8a !
I am seeing an improvement with these changes after 16384x64x64 but a performance regression before 16384x66x66. Can you please add some overload wrappers and tags to select this alpha change for matrices > 64x64?
This reverts commit 82b2728.
@vqd8a, using https://github.com/e10harvey/kokkos-kernels/tree/6a28191eeb729cea1c8320851bb5acc88b5c2098, I am seeing: access_view_bounds_check (using KOKKOSKERNELS_MACRO_MIN)
Using https://github.com/e10harvey/kokkos-kernels/tree/61eda67251b66dc6c10b96cbf107aee9569388e1 I am seeing: access_view_bounds_check (if-else)
I am also seeing a small speedup (of ~30 GFLOP/s for 60x60 in this case) with the Note that both of the above tables are using gcc/8.3.0 + cuda/11 on V100 with the alpha tag selection such that alpha is applied in the fma for matrices > 64x64, otherwise alpha is applied in |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: vqd8a |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740 # 46 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight # 46 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 704 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 351 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 695 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 682 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001 # 88 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: vqd8a |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740 # 47 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight # 47 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 705 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 352 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 696 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 683 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001 # 89 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: vqd8a |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740 # 48 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight # 48 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 706 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 353 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 697 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 684 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001 # 90 (click to expand)
|
src/batched: Add AlphaTag compile time selection
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: vqd8a |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: vqd8a |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740 # 78 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight # 77 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 735 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 382 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 726 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 714 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001 # 119 (click to expand)
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: vqd8a |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @vqd8a !
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
Not multiplying with alpha in matrix multiplication, but when storing results back to global memory instead.
Some perf numbers with batch_size=32768 on V100 GPU