-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CUSPARSE_VERSION
rather than CUDA_VERSION
to choose how to use cuSPARSE TPL
#2009
Comments
Here's a more fullsome description: #1967 |
In general we should always first try to use the version of the library that we are using instead of the version of the associated runtime. That's a more reasonable approach and if one day NVIDIA releases CUBLAS/CUSPARSE at a difference cadence from CUDA that would prevent issues. |
This issue is the versioning logic is not correct, since the cuSPARSE version doesn't necessarily match the CUDA version, and this was exhibited in the CUDA versions used on some SNL machines. And yes, the logic in Stokhos was copied from Kokkos Kernels. I don't know what mechanism was used to choose that logic, but there seems to have been a transition period around CUDA 11 where it isn't right (it also may be the case that the cuSPARSE documentation was not correct on this issue). |
I'm going to close this in favor of #1967, which I should have found before opening this. I'll try to gather up which cuSPARSE versions actually came with which CUDA releases and perhaps we can take it from there. |
That was done at least partially in this comment: trilinos/Trilinos#12238 (comment) |
I was looking at something else in Trilinos and saw @etphipp referencing a problem with logic around cuSPARSE algorithm selection, CUDA version, and cuSPARSE version.
trilinos/Trilinos#12238 (comment)
I don't understand from that PR alone what the specific problem is in Stokhos is (or what the referenced problem in Kokkos Kernels is). This may be our offending code that the Stokhos PR copied with some modifications, and similar snippets may appear in a couple other places in Kokkos Kernels:
kokkos-kernels/sparse/tpls/KokkosSparse_spmv_tpl_spec_decl.hpp
Lines 108 to 126 in c3646ed
The cuSPARSE docs re: versioning[0] are contradictory to me, simultaneously saying "Using different versions of cuSPARSE and the CUDA runtime is not supported" but also not promising that any of the cuSPARSE version fields will match the CUDA runtime version (or anything else, for that matter; and is the CUDA runtime version the same as the CUDA toolkit version?).
In any case, we probably want to use
CUSPARSE_VERSION
everywhere to decide how to use cuSPARSE, rather thanCUDA_VERSION
.[0] https://docs.nvidia.com/cuda/cusparse/index.html#compatibility-and-versioning
The text was updated successfully, but these errors were encountered: