Use `CUSPARSE_VERSION` rather than `CUDA_VERSION` to choose how to use cuSPARSE TPL #2009

cwpearson · 2023-10-23T16:20:40Z

I was looking at something else in Trilinos and saw @etphipp referencing a problem with logic around cuSPARSE algorithm selection, CUDA version, and cuSPARSE version.

trilinos/Trilinos#12238 (comment)

I don't understand from that PR alone what the specific problem is in Stokhos is (or what the referenced problem in Kokkos Kernels is). This may be our offending code that the Stokhos PR copied with some modifications, and similar snippets may appear in a couple other places in Kokkos Kernels:

kokkos-kernels/sparse/tpls/KokkosSparse_spmv_tpl_spec_decl.hpp

Lines 108 to 126 in c3646ed

    
           #if CUSPARSE_VERSION >= 11301 
        
             cusparseSpMVAlg_t alg = CUSPARSE_SPMV_ALG_DEFAULT; 
        
           #else 
        
             cusparseSpMVAlg_t alg = CUSPARSE_MV_ALG_DEFAULT; 
        
           #endif 
        
             if (controls.isParameter("algorithm")) { 
        
               const std::string algName = controls.getParameter("algorithm"); 
        
               if (algName == "default") 
        
           #if CUSPARSE_VERSION >= 11301 
        
                 alg = CUSPARSE_SPMV_ALG_DEFAULT; 
        
           #else 
        
                 alg = CUSPARSE_MV_ALG_DEFAULT; 
        
           #endif 
        
               else if (algName == "merge") 
        
           #if CUSPARSE_VERSION >= 11301 
        
                 alg = CUSPARSE_SPMV_CSR_ALG2; 
        
           #else 
        
                 alg = CUSPARSE_CSRMV_ALG2; 
        
           #endif

The cuSPARSE docs re: versioning[0] are contradictory to me, simultaneously saying "Using different versions of cuSPARSE and the CUDA runtime is not supported" but also not promising that any of the cuSPARSE version fields will match the CUDA runtime version (or anything else, for that matter; and is the CUDA runtime version the same as the CUDA toolkit version?).
In any case, we probably want to use CUSPARSE_VERSION everywhere to decide how to use cuSPARSE, rather than CUDA_VERSION.

[0] https://docs.nvidia.com/cuda/cusparse/index.html#compatibility-and-versioning

The text was updated successfully, but these errors were encountered:

cwpearson · 2023-10-23T16:22:02Z

Here's a more fullsome description: #1967

lucbv · 2023-10-23T16:32:53Z

In general we should always first try to use the version of the library that we are using instead of the version of the associated runtime. That's a more reasonable approach and if one day NVIDIA releases CUBLAS/CUSPARSE at a difference cadence from CUDA that would prevent issues.
On the other hand not every vendor is setting versions of their libraries correctly so it's hard to implement a consistent strategy.
In the case of CUBLAS/CUSPARSE we should definitely prioritize using CUBLAS_VERSION / CUSPARSE_VERSION

etphipp · 2023-10-23T16:35:37Z

I don't understand from that PR alone what the specific problem is in Stokhos is (or what the referenced problem in Kokkos Kernels is).

This issue is the versioning logic is not correct, since the cuSPARSE version doesn't necessarily match the CUDA version, and this was exhibited in the CUDA versions used on some SNL machines. And yes, the logic in Stokhos was copied from Kokkos Kernels. I don't know what mechanism was used to choose that logic, but there seems to have been a transition period around CUDA 11 where it isn't right (it also may be the case that the cuSPARSE documentation was not correct on this issue).

cwpearson · 2023-10-23T16:48:51Z

I'm going to close this in favor of #1967, which I should have found before opening this. I'll try to gather up which cuSPARSE versions actually came with which CUDA releases and perhaps we can take it from there.

etphipp · 2023-10-23T16:52:47Z

That was done at least partially in this comment: trilinos/Trilinos#12238 (comment)

cwpearson closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `CUSPARSE_VERSION` rather than `CUDA_VERSION` to choose how to use cuSPARSE TPL #2009

Use `CUSPARSE_VERSION` rather than `CUDA_VERSION` to choose how to use cuSPARSE TPL #2009

cwpearson commented Oct 23, 2023

cwpearson commented Oct 23, 2023 •

edited

Loading

lucbv commented Oct 23, 2023

etphipp commented Oct 23, 2023

cwpearson commented Oct 23, 2023

etphipp commented Oct 23, 2023

Use CUSPARSE_VERSION rather than CUDA_VERSION to choose how to use cuSPARSE TPL #2009

Use CUSPARSE_VERSION rather than CUDA_VERSION to choose how to use cuSPARSE TPL #2009

Comments

cwpearson commented Oct 23, 2023

cwpearson commented Oct 23, 2023 • edited Loading

lucbv commented Oct 23, 2023

etphipp commented Oct 23, 2023

cwpearson commented Oct 23, 2023

etphipp commented Oct 23, 2023

Use `CUSPARSE_VERSION` rather than `CUDA_VERSION` to choose how to use cuSPARSE TPL #2009

Use `CUSPARSE_VERSION` rather than `CUDA_VERSION` to choose how to use cuSPARSE TPL #2009

cwpearson commented Oct 23, 2023 •

edited

Loading