-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental tensor-core SpMV for BsrMatrix #1090
Conversation
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think the code looks good but I have a few comments that needs to be addressed.
src/sparse/KokkosSparse_spmv.hpp
Outdated
@@ -57,6 +57,10 @@ | |||
#include "KokkosBlas1_scal.hpp" | |||
#include "KokkosKernels_Utils.hpp" | |||
|
|||
// tensor core dispatch | |||
#include "KokkosSparse_BlockCrsMatrix.hpp" | |||
#include "KokkosSparse_spmv_tensor_core_impl.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be included here, I would suggest looking at KokkosSparse_spmv_spec.hpp
and see if it makes sense to have the ETI/dispatch in that file, if not you will have to replicate the structure of that file in something like KokkosSparse_spmv_tensore_spec.hpp
typedef typename AMatrix::value_type AScalar; | ||
typedef typename YMatrix::value_type YScalar; | ||
typedef typename XMatrix::value_type XScalar; | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code should be removed
*/ | ||
int team_size() const { | ||
if (0 != (y.extent(1) % WMMA_TILE_N)) { | ||
Kokkos::Impl::throw_runtime_exception("y.extent(0) and A.blockDim() mismatch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a mismatch between the test implemented and the explanation in the throw. Are we checking y.extent(1)
i.e. the number of vectors stored or y.extent(0)
i.e. the number of rows in the vector?
const YMatrix& y | ||
) { | ||
if (0 != y.extent(1) % A.blockDim()) { | ||
Kokkos::Impl::throw_runtime_exception("y.extent(1) not a multiple of A block dimension"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we just pad the multivector with additional vectors full of zeros when loading the y fragments?
#include "Cuda/Kokkos_Cuda_Half.hpp" | ||
|
||
#include "KokkosSparse_BlockCrsMatrix.hpp" | ||
#include "KokkosSparse_spmv_tensor_core_impl.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid loading the impl file if possible and rely on the public header to benefit from ETI
double(AT::abs(expected_y(i))), i, | ||
double(AT::abs(y(i))) | ||
); | ||
// std::cout << "expected_y(" << i << ")=" << AT::abs(expected_y(i)) << ", y(" << i << ")=" << AT::abs(y(i)) << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented line please
typename CrsMat::ordinal_type &blockSize) | ||
{ | ||
|
||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused code branch please
auto it = std::unique(bjs.begin(), bjs.end()); | ||
bjs.resize(it - bjs.begin()); | ||
} | ||
// std::cerr << "bi=" << bi << ":"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented line please
Kokkos::Random_XorShift64<Kokkos::HostSpace> rand(13718); | ||
|
||
// fill outputs with random values | ||
// Kokkos::Random_XorShift64_Pool<Kokkos::HostSpace> rand_pool(13718); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented line please
Kokkos::fill_random(hi_x, rand_pool, randomUpperBound<typename hi_scalar_view_t::value_type>(10)); | ||
Kokkos::fill_random(hi_y, rand_pool, randomUpperBound<typename hi_scalar_view_t::value_type>(10)); | ||
|
||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused code branch
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
6 similar comments
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
5e8e823
to
ee41a1d
Compare
953c92f
to
beb3a89
Compare
Extensible to any CUDA wmma fragment type and any BsrMatrix::blockDim. Each team is made up of four warps, which are in a 2x2 grid in the M and N direction of the sparse matrix. The input operands are all operated on at a granularity defined by the BsrMatrix blockDim. Each dense block in the product MV is covered by a 2D grid of teams as needed, where the footprint of each team is determined by the chosen fragment type. src/sparse/KokkosSparse_BlockCrsMatrix.hpp ==== Added an is_block_crs_matrix static test, similar to the is_view from Kokkos. This is not actually used in this PR, but is convenient for future KokkosSparse::spmv extensions. src/sparse/KokkosSparse_CrsMatrix.hpp ==== Added an is_crs_matrix static test, similar to the is_view from Kokkos. This is not actually used in this PR, but is convenient for future KokkosSparse::spmv extensions. src/sparse/KokkosSparse_spmv.hpp ==== Integration with the existing SpMV interface. The current KokkosSparse::spmv(controls, mode, ...) interface is extended with compile-time checks for whether the AMatrix type is a BsrMatrix or not. If so, the new implementation is called. If it's a CrsMatrix, the existing implementation is called. If neither, a stub implementation that provides a useful error is introduced. Prior to this PR, passing a non-CrsMatrix to Kokkos::spmv would be a compile-time error of varying transparency. The user must provide a control with "algorithm" = "experimental_bsr_tc", or the spmv will be a run-time error. Furthermore, on AMPERE, the internal fragment type may be controlled with "precision" = "mixed" or "double". src/sparse/impl/KokkosSparse_spmv_bsrmatrix_impl.hpp ==== The actual tensor core SpMV implementation. In the future, this file will hold additional impls for the BsrMatrix type. The functor has optional compile-time parameters that can be used to optimize out divmod instructions. If they are provided as 0, these parameters are determined at runtime instead. The entire implementation is guarded by the Kokkos VOLTA architecture with CUDA enabled. FP16,32, and 64 are supported on any inputs/outputs. The multiplicand fragments are staged through shared memory where they are converted to the internal fragment type. unit_test/sparse/Test_Sparse_spmv.hpp ==== Added unit test for the new SpMV. It is invoked through the same opt-in interface that is exposed to the user. It is compared against a Kokkos::spmv() called on the same operands src/CMakeLists.txt ==== Add in ETI files for the BsrMatrix spmv src/impl/generated_specializations... ==== ETI files for the BsrMatrix spmv
beb3a89
to
9832a87
Compare
946df3f
to
d4aed7f
Compare
auto it = std::unique(bjs.begin(), bjs.end()); | ||
bjs.resize(it - bjs.begin()); | ||
} | ||
// std::cerr << "bi=" << bi << ":"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably needs to go too
rowmap.push_back(entries.size()); | ||
} | ||
|
||
// std::cerr << "rowmap size = " << rowmap.size() << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to go
|
||
BsrMatrixSpMVTensorCoreFunctorParams params; | ||
|
||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically since you did not call private:
anywhere you don't need this.
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; This inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light # 183 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 561 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 208 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 552 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10 # 183 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 541 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight # 185 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9 # 177 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720_GCC740 # 167 (click to expand)
|
removing an unused typedef that bothers the auto-tester.
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light # 184 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 562 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 209 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 553 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10 # 184 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 542 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight # 186 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9 # 178 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720_GCC740 # 168 (click to expand)
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; This inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job KokkosKernels_PullRequest_Tpls_CUDA9 to start: Total Wait = 3603
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light # 186 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 564 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 211 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 555 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10 # 186 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 544 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight # 188 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9 # 179 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720_GCC740 # 169 (click to expand)
|
Did not realize that the previous typedef would also need to go.
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; This inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: cwpearson |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
3 similar comments
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments below, none blocking, so I will approve. We probably need a clean up pass though in a follow up PR.
|
||
namespace KokkosKernels { | ||
namespace Impl { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we will get false for HIP/OMP Target etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we don't support these feature for now on other GPUs, but OMP Target will work on NVIDIA GPUs, so may be this has to be yes in the near future? For now at least it is better to throw for any other exec space.
namespace KokkosSparse { | ||
namespace Experimental { | ||
namespace Impl { | ||
// clang-format off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The // clang-format off
actually allows us to avoid clang formatting in the following region until // clang-format on
Here the problem is that clang-format like to add spaces between @
and the following text which leads to parsing errors from CMake so we need to guard the CMake variables against clang-formatting
} // namespace Impl | ||
} // namespace Experimental | ||
} // namespace KokkosSparse | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no new line here might be a problem in some places. Same reason why Github is flagging it.
namespace Impl { | ||
// clang-format off | ||
@SPARSE_SPMV_BSRMATRIX_ETI_DECL_BLOCK@ | ||
// clang-format on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
teamIdx_j * WARPS_PER_TEAM_X * FRAG_N + warpIdx_x * FRAG_N + fj; | ||
|
||
// only store inside the block boundary | ||
// FIXME: what if Y is not wide enough? check y(_, j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me?
@@ -10,6 +10,7 @@ | |||
|
|||
#include "KokkosKernels_Controls.hpp" | |||
#include "KokkosKernels_default_types.hpp" | |||
#include "Cuda/Kokkos_Cuda_Half.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have liked a separate file for bsr spmv. This compilation unit will soon be the bottleneck
Thanks @cwpearson ! |
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ srajama1 ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
Extensible to any CUDA wmma fragment type and any BsrMatrix::blockDim.
Each team is made up of four warps, which are in a 2x2 grid in the M and
N direction of the sparse matrix.
The input operands are all operated on at a granularity defined by the
BsrMatrix blockDim.
Each dense block in the product MV is covered by a 2D grid of teams as
needed, where the footprint of each team is determined by the chosen
fragment type.
src/sparse/KokkosSparse_BlockCrsMatrix.hpp
Added an is_block_crs_matrix static test, similar to the is_view from Kokkos.
This is not actually used in this PR, but is convenient for future
KokkosSparse::spmv extensions.
src/sparse/KokkosSparse_CrsMatrix.hpp
Added an is_crs_matrix static test, similar to the is_view from Kokkos.
This is not actually used in this PR, but is convenient for future
KokkosSparse::spmv extensions.
src/sparse/KokkosSparse_spmv.hpp
Integration with the existing SpMV interface.
The current KokkosSparse::spmv(controls, mode, ...) interface is
extended with compile-time checks for whether the AMatrix type is a
BsrMatrix or not. If so, the new implementation is called.
If it's a CrsMatrix, the existing implementation is called.
If neither, a stub implementation that provides a useful error is
introduced.
Prior to this PR, passing a non-CrsMatrix to Kokkos::spmv would be a compile-time error of varying transparency.
The user must provide a control with "algorithm" =
"experimental_bsr_tc", or the spmv will be a run-time error.
Furthermore, on AMPERE, the internal fragment type may be controlled
with "precision" = "mixed" or "double".
src/sparse/impl/KokkosSparse_spmv_bsrmatrix_impl.hpp
The actual tensor core SpMV implementation.
In the future, this file will hold additional impls for the BsrMatrix
type.
The functor has optional compile-time parameters that can be used to
optimize out divmod instructions.
If they are provided as 0, these parameters are determined at runtime
instead.
The entire implementation is guarded by the Kokkos VOLTA architecture with CUDA enabled.
FP16,32, and 64 are supported on any inputs/outputs. The multiplicand fragments are staged through shared memory where they are converted to the internal fragment type.
unit_test/sparse/Test_Sparse_spmv.hpp
Added unit test for the new SpMV.
It is invoked through the same opt-in interface that is exposed to the user.
It is compared against a Kokkos::spmv() called on the same operands
src/CMakeLists.txt
Add in ETI files for the BsrMatrix spmv
src/impl/generated_specializations...
ETI files for the BsrMatrix spmv