-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BSR block SpGEMM implementation #1099
Conversation
Default algorithm implementation details (FY21 block spgemm status)
|
Preliminary CPU performanceI connected MKL to benchmark block spgemm performance on CPU. For multiplication of random matrix by itself on i9-9900K (8 cores/16 threads)/32GB RAM/Ubuntu/gcc-8.4, I got some rough timings (no repetition, accuracy around 0,5~1s):
Notes:
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
eae9f61
to
85b19e3
Compare
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
13d919b
to
159d08c
Compare
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
993e92c
to
5bfef1c
Compare
5bfef1c
to
221fc89
Compare
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
@lucbv @srajama1 @brian-kelley @ppebay @fnrizzi @uhetmaniuk I've redone this PR from scratch, restructuring and cleaning its content to get BSR spgemm with minimal changes introduced to baseline CSR implementation. Those BSR changes over CRS implementation can be inspected here (also pinned this link in the PR description for convenience). To accommodate for this approach, I took some irregular implementation decisions like disabling large parts of unused code with My motivation behind proposing this minimal changes approach and the follow-up PR for refactoring and performance tuning was to secure a clear correct implementation checkpoint - with the following considerations taken into account:
|
221fc89
to
2549b48
Compare
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
090f472
to
b712a56
Compare
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Hi @lucbv! Would you allow a Jenkins re-test ? I just rebased this PR on current |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; This inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: MikolajZuzek |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: MikolajZuzek |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with the changes made
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
@lucbv @brian-kelley Thank you very much for your thorough reviews and great comments! |
This PR introduces implementation for block version of sparse BSR matrix-matrix multiplication (block spgemm).
Minimal changes to baseline CSR spgemm implementation are intended for clarity and correctness verifiability, so that this PR can provide correct implementation checkpoint. For the follow up with related refactoring and performance tuning see PR #1215.
📌 Files changed with respect to original sparse GEMM implementation
Status
Algorithm coverage:
SPGEMM_DEBUG
);SPGEMM_KK
,SPGEMM_KK_LP
, andSPGEMM_KK_MEMORY
aliasSPGEMM_KK_MEMSPEED
):SPGEMM_KK
→MultiCoreTag
;SPGEMM_KK_LP
→MultiCoreTag4
;SPGEMM_KK_MEMORY
→GPUTag
;SPGEMM_KK_MEMORY_BIGSPREADTEAM
→GPUTag6
;SPGEMM_KK_MEMORY_SPREADTEAM
→GPUTag4
;SPGEMM_KK_SPEED
aliasSPGEMM_KK_DENSE
):MultiCoreTag
;GPUTag
;Implementation
KokkosSparse::block_spgemm_symbolic()
andKokkosSparse::block_spgemm_numeric()
in KokkosSparse_spgemm.hpp (whole spgemm symbolic phase is re-used behind this dispatch);spgemm_numeric()
(KokkosSparse_spgemm_numeric.hpp) takes optionalblock_dim
and dispatches between CRS and BSR spgemm;KokkosSparse::Impl::BSPGEMM_NUMERIC<...>::bspgemm_numeric()
dispatches to (default, debug, TPL, etc.);BlockHashmapAccumulator
(KokkosKernels_BlockHashmapAccumulator.hpp):HashmapAccumulator
mirror that provides hashmap-based column indexing and dense block accumulation;atomic_add
in batched dgemm so it can replacekk_vector_block_mul_add()
spgemm
);