-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First phase of SpGEMM TPL refactor #1582
Conversation
It reuses less than the interfaces of 11.4+ or 10.x (having to call the full compute in symbolic to get C_nnz) but is still an improvement over SPGEMM_KK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, at some point we will want to go over the documentation and make sure that we explain what the users are supposed to do with the handle/control object if they want reuse, etc... also we probably want to explain that they need to create a handle per matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @brian-kelley !
&h->bufferSize4, NULL)); | ||
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaMalloc((void **)&h->buffer4, h->bufferSize4)); | ||
KOKKOS_CUSPARSE_SAFE_CALL(cusparseSpGEMM_compute( | ||
h->cusparseHandle, h->opA, h->opB, &alpha, h->descr_A, h->descr_B, &beta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap with alpha
and beta
user passes in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e10harvey Our SpGEMM interface actually doesn't take alpha or beta, and I wasn't planning on adding them. beta != 0
would mean doing a fused SpGEMM+SpAdd (C := AB + beta*C
), which is possible but would be a lot of work and would mean a whole new symbolic and numeric implementation.
Alpha (C := alpha * AB
) would be a lot easier to support, but there's also not a KokkosKernels use case for it that I know of. If this was needed, the user could either apply the scaling themselves when C is used, or just do a KokkosBlas::scal
on C's values.
@lucbv That's the nice thing about not caching this stuff in the matrices, the user doesn't have to worry about it at all. All of that is encapsulated in the spgemm handle, so the user only needs to pass in the KokkosKernelsHandle (like with the native impl) and they get reuse. Edit: If they want to run cuSPARSE stuff on a different stream, that does happen through the cusparseHandle_t. But SpGEMM doesn't take a Controls object now. Ideally, when all our functions take exec space instances, we just manage the non-singleton cusparse handle internally. |
Why is it fast since I imagine creating a descriptor involves memory allocation (hopefully host memory only)? |
@jczhang07 Since the descriptor is a totally opaque structure, it's impossible to know exactly what happens during its creation (without talking to Nvidia, anyway). This is just a single data point but I just tried timing The one thing I am sure of is that the descriptor creation isn't doing anything on device with the rowptrs, entries, values you give it, because kernel launch overhead is already more than 1us. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC1020_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey lucbv ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
Note: we can always add the matrix descriptor back into CrsMatrix, but we would need to fix several things:
cusparseMatDescr_t
, but also thecusparseSpMatDescr_t
if on CUDA 11.x. The former just contains things like general vs. symmetric, unit diagonal vs not, etc, but the latter actually has pointers and extents of rowptrs, entries and values.cusparseDestroyMatDescr
exists we should use it).Because of this I would prefer to just not cache these things.