-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BSR matrix SpMV Performance #1740
Improve BSR matrix SpMV Performance #1740
Conversation
87bfbe2
to
3afb377
Compare
94df8d0
to
3afb377
Compare
2302095
to
e4706ed
Compare
03bd1f9
to
2f87d2b
Compare
2f87d2b
to
ae17e46
Compare
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight # 731 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10 # 322 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC1020 # 345 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC1020_Light_LayoutRight # 613 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC1020 # 573 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL19 # 661 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001 # 714 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110 # 515 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_GCC1020 # 510 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_ROCM520 # 510 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 # 32 (click to expand)
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight # 734 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10 # 325 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021 # 4 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021_Light_LayoutRight # 4 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GNU1021 # 4 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL19_solo # 10 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001_solo # 5 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110 # 517 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_GCC1020 # 512 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_ROCM520 # 512 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 # 34 (click to expand)
|
69a6882
to
33506f3
Compare
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - A user has pushed a change to the PR before testing completed. NEW EVENT 'committed', ID C_kwDOBK7s5toAKGNhYmRmNzQzNzYzNmY5YjI3NmM4YzZiMTk4YWU0NTllZTg3MWFiODQ... The Jenkins Jobs will be shutdown; Testing of this PR must occur again. |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight # 735 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10 # 326 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021 # 5 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021_Light_LayoutRight # 5 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GNU1021 # 5 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL19_solo # 11 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001_solo # 6 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110 # 518 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_GCC1020 # 513 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_ROCM520 # 513 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 # 35 (click to expand)
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight # 750 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10 # 341 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021 # 20 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021_Light_LayoutRight # 20 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GNU1021 # 20 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL19_solo # 26 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001_solo # 19 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110 # 531 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_GCC1020 # 526 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_ROCM520 # 525 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 # 47 (click to expand)
|
…le-vector cases. * Adds a new `"v4.2"` BsrMatrix SpMV implementation for non-transpose mode. * It is the default (when TPLs are disabled or not supported) on the GPU for non-transpose mode * The old implementation is retained for all other modes * old implementation may be requested explicitly with `controls.setParameter("algorithm", "v4.1")` * Adds explicit invocation of old "4.1" impl to `KokkosKernels_sparse_spmv_bsr_benchmark` * When TPLs are enabled, the new implementation may be requested anyway with `controls.setParameter("algorithm", "v4.2")` * simplify `KokkosKernels::Impl::always_false_v` * Add `template <typename View> class with_unmanaged` which provides a `type` alias reproducing `View` with `Kokkos::Unmanaged` added to its memory traits * Add `KokkosKernels::Impl::with_unmanaged_t` as an alias for `typename with_unamanged::type` * Add `template <typename View> auto KokkosKernels::Impl:make_unmanaged(const View &v)` which constructs a `with_unmanaged_t<View>` from v * Add `<sstream>` to `KokkosKernels_Error.hpp` * Add `DieOnError` and `SkipOnError` wrapped `bool`s to give names to boolean function arguments * Link `KokkosKernels_sparse_spmv_bsr_benchmark` against `stdc++fs` for rocm 5.2 * More aggressive block size filtering in `KokkosSparse_csr_detect_block_size.hpp` * Removes a useless warning from `Controls::getParameter` since what happens when a parameter is unset was made explicit in kokkos@be87154 * `BsrMatrix` constructor throws when combination of nnz, rows, and columns don't make sense * Change `BsrMatrix::block_layout` to `BsrMatrix::block_layout_type` for consistency * Adds `BsrMatrix::unmanaged_block` to return an unmanged view to a 2D block of values * Adds `BsrMatrix::unmanaged_block_const` to return a const unmanged view to a 2D block of values
1ff9d9c
to
86d8371
Compare
Looks like the |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine with me but we should probably clean up that little bit of extra code...
"BsrMatrix:: annz should be a multiple of the number of entries in a " | ||
"block"); | ||
} | ||
if (annz % (blockDim_ * blockDim_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of above check?
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
Marking with TrilinosPatchMatch to track so the Trilinos changes are not clobbered in the event we issue a patch relesae (see PR trilinos/Trilinos#12096) |
Improve performance of the native
BsrMatrix
SpMV, especially for single-vector cases."v4.2"
BsrMatrix SpMV implementation for non-transpose mode.controls.setParameter("algorithm", "v4.1")
KokkosKernels_sparse_spmv_bsr_benchmark
controls.setParameter("algorithm", "v4.2")
Results
sp is speedup of 4.2 vs 4.1. TPLs only support specific integer combinations. The table below summarizes the speedup achieved across various operand types
V100, bs=7, k=1, rows=1.84m, nnz=88m
GB / second
V100, bs=7, k=3, rows=1.84m, nnz=88m
GB / second
A100, bs=7, k=1, rows=1.84m, nnz=88m
GB / second
A100, bs=7, k=3, rows=1.84m, nnz=88m
GB / second
MI250, bs=7, k=1, rows=1.84m, nnz=88m
GB / second
MI250, bs=7, k=3, rows=1.84m, nnz=88m
GB / second
Results (Tpetra BlockCrs Performance Test Matrix)
sp is speedup of 4.2 vs 4.1. TPLs only support specific integer combinations. The table below summarizes the speedup achieved across various operand types
V100, bs=7, k=1, rows=1.84m, nnz=88m
GB / second
V100, bs=7, k=3, rows=1.84m, nnz=88m
GB / second
MI250, bs=7, k=1, rows=1.84m, nnz=88m
GB / second
MI250, bs=7, k=3, rows=1.84m, nnz=88m
GB / second
Other changes:
KokkosKernels::Impl::always_false_v
template <typename View> class with_unmanaged
which provides atype
alias reproducingView
withKokkos::Unmanaged
added to its memory traitsKokkosKernels::Impl::with_unmanaged_t
as an alias fortypename with_unamanged::type
template <typename View> auto KokkosKernels::Impl:make_unmanaged(const View &v)
which constructs awith_unmanaged_t<View>
from v<sstream>
toKokkosKernels_Error.hpp
DieOnError
andSkipOnError
wrappedbool
s to give names to boolean function argumentsKokkosKernels_sparse_spmv_bsr_benchmark
againststdc++fs
for rocm 5.2KokkosSparse_csr_detect_block_size.hpp
Controls::getParameter
since what happens when a parameter is unset was made explicit in be87154BsrMatrix
constructor throws when combination of nnz, rows, and columns don't make senseBsrMatrix::block_layout
toBsrMatrix::block_layout_type
for consistencyBsrMatrix::unmanaged_block
to return an unmanged view to a 2D block of valuesBsrMatrix::unmanaged_block_const
to return a const unmanged view to a 2D block of values