Kokkos::Controls and cusparse merge algorithm, see issue #670 #725

lucbv · 2020-05-22T18:46:15Z

Brief content

This PR implements two new features:

the Controls class that contains an UnorderedMap and handles for various backends.
more logic is added in the cusparse SpMV backend to allow the merge algorithm to be used

Customers

The Controls feature is an enhancement for no particular customer.
The merge algorithm path is implemented to support Aria work by @vbrunini

Details on merge algorithm backend

The main work here is to add some logic to choose the right cusparseSpMVAlg_t argument for the SpMV routine. This is now decided based on the algorithm parameter passed to controls. Additional work has been performed to expand the number of type combinations in ETI, it is now possible to choose any combinations of:

Scalar={float, double}
OrdinalType={int, int64_t}
OffsetType={int, int64_t}
Layout={left, right}

Details on Controls

So far this is not very fleshed out, the new class contains a std::unordered_map<string, string> to store kernels parameters and it also stores backend handles for cusparse and cublas.
A new interface in KokkosSparse::spmv is implemented to accept a controls object that drives the algorithmic and performance parameters of the kernel. It also provides the backend handle, thus avoiding to create and destroy it within each call.

Something bugs out in the cmake logic. Could be a Lassen issue too.

This allows the cusparse specialization layer to use either the default or the merge algorithm depending on input parameters from the user.

This test needs to be guarded as the SpMV merge algorithm is only available in cuSPARSE, not natively in Kokkos-Kernels at this point.

Fixing a name clashing in sptrsv_cuSPARSE: error_t which is usually used as a type not a variable. Adding isParamter method to Controls. Listing more parameters as controled parameters in SpMV

lucbv · 2020-05-22T18:48:05Z

For some reason the issue in the title does not provide a link... so see issue #670

lucbv · 2020-05-22T21:23:46Z

kokkos-dev: spot-check

#######################################################
PASSED TESTS
#######################################################
clang-4.0.1-Pthread_Serial-hwloc-release build_time=352 run_time=177
clang-4.0.1-Pthread_Serial-release build_time=395 run_time=143
clang-7.0.1-Cuda_OpenMP-hwloc-release build_time=1080 run_time=241
clang-7.0.1-Cuda_OpenMP-release build_time=1080 run_time=244
cuda-9.2-Cuda_OpenMP-release build_time=1124 run_time=261
gcc-5.3.0-OpenMP-hwloc-release build_time=220 run_time=62
gcc-5.3.0-OpenMP-release build_time=216 run_time=61
gcc-7.3.0-Serial-hwloc-release build_time=186 run_time=65
gcc-7.3.0-Serial-release build_time=187 run_time=65
intel-16.0.3-Pthread-hwloc-release build_time=279 run_time=55
intel-16.0.3-Pthread-release build_time=289 run_time=66
intel-16.0.3-Serial-hwloc-release build_time=297 run_time=76
intel-16.0.3-Serial-release build_time=296 run_time=75
intel-17.0.1-OpenMP-hwloc-release build_time=474 run_time=58
intel-17.0.1-OpenMP-release build_time=412 run_time=54

kokkos-dev: spot-check-tpls

#######################################################
PASSED TESTS
#######################################################
clang-4.0.1-Pthread_Serial-hwloc-release build_time=316 run_time=128
clang-4.0.1-Pthread_Serial-release build_time=314 run_time=129
clang-7.0.1-Cuda_OpenMP-hwloc-release build_time=1082 run_time=240
clang-7.0.1-Cuda_OpenMP-release build_time=1087 run_time=244
gcc-5.3.0-OpenMP-hwloc-release build_time=216 run_time=63
gcc-5.3.0-OpenMP-release build_time=216 run_time=64
gcc-7.3.0-Serial-hwloc-release build_time=191 run_time=66
gcc-7.3.0-Serial-release build_time=186 run_time=67
intel-16.0.3-Pthread-hwloc-release build_time=285 run_time=55
intel-16.0.3-Pthread-release build_time=289 run_time=64
intel-16.0.3-Serial-hwloc-release build_time=300 run_time=74
intel-16.0.3-Serial-release build_time=299 run_time=74
intel-17.0.1-OpenMP-hwloc-release build_time=415 run_time=52
intel-17.0.1-OpenMP-release build_time=418 run_time=52
#######################################################
FAILED TESTS
#######################################################
cuda-9.2-Cuda_OpenMP-release (build failed)
#######################################################

so close but need to clean-up something in the tpl handles management... let's put a band-aid on it for now.

brian-kelley

@lucbv I just had a couple questions, and the duplicated copyright headers in one file. Other than that, everything looks good.

src/impl/tpls/KokkosKernels_tpl_handles.cpp

brian-kelley · 2020-05-22T22:03:37Z

src/impl/tpls/KokkosSparse_spmv_tpl_spec_decl.hpp

-    if(std::is_same<value_type, float>::value)  {myCudaDataType = CUDA_R_32F;}
-    if(std::is_same<value_type, double>::value) {myCudaDataType = CUDA_R_64F;}
+    if(std::is_same<value_type, float>::value)    {myCudaDataType = CUDA_R_32F;}
+    if(std::is_same<value_type, double>::value)   {myCudaDataType = CUDA_R_64F;}


Any reason why CUDA_C_32F and CUDA_C_64F aren't supported here? It's ok if those use the fallback SPMV for now, but if it will be done later, we should make a note/issue for it

I guess I answered that in your comment below.

brian-kelley · 2020-05-22T22:06:52Z

src/impl/tpls/KokkosSparse_spmv_tpl_spec_decl.hpp

+  KOKKOSSPARSE_SPMV_CUSPARSE(float,  int, int, Kokkos::LayoutRight, true)
+  // KOKKOSSPARSE_SPMV_CUSPARSE(float,  int, int, Kokkos::LayoutRight, false)
+#if defined(CUSPARSE_VERSION) && (10300 <= CUSPARSE_VERSION)
+  KOKKOSSPARSE_SPMV_CUSPARSE(double, int64_t, int, Kokkos::LayoutLeft,  true)


I guess if we added the complex cudaDataTypes above, we could instantiate all of these for complex?

Yes it should, I have not had time to look into complex yet though.
If in your experience it is easy to just instantiate and have it work OK through cusparse then I can do it. If it is an unknown I would suggest holding-off on it and do it in a later PR.

brian-kelley · 2020-05-22T22:11:22Z

unit_test/sparse/Test_Sparse_spmv.hpp

+  // y is the quantity being tested here,
+  // so let us use y_value_type to determine
+  // the appropriate tolerance precision.
+  const y_value_mag_type eps = std::is_same<y_value_mag_type, float>::value ? 2*1e-3 : 1e-7;


Does this change fix the SPMV unit tests for float (#715)?

No, this is a new test so it won't fix the old one, this should look pretty much the same as the old one though which means that we might have to do something about it in the future, although this test uses cuSPARSE so that algo might not have the same issues as native kokkos-kenrels

lucbv · 2020-05-24T02:17:01Z

spot check cleared on kokkos-dev:

spot-check

#######################################################
PASSED TESTS
#######################################################
clang-4.0.1-Pthread_Serial-hwloc-release build_time=349 run_time=228
clang-4.0.1-Pthread_Serial-release build_time=348 run_time=143
clang-7.0.1-Cuda_OpenMP-hwloc-release build_time=1259 run_time=248
clang-7.0.1-Cuda_OpenMP-release build_time=1437 run_time=244
cuda-9.2-Cuda_OpenMP-release build_time=1284 run_time=275
gcc-5.3.0-OpenMP-hwloc-release build_time=239 run_time=86
gcc-5.3.0-OpenMP-release build_time=241 run_time=98
gcc-7.3.0-Serial-hwloc-release build_time=202 run_time=73
gcc-7.3.0-Serial-release build_time=204 run_time=71
intel-16.0.3-Pthread-hwloc-release build_time=303 run_time=120
intel-16.0.3-Pthread-release build_time=306 run_time=75
intel-16.0.3-Serial-hwloc-release build_time=319 run_time=89
intel-16.0.3-Serial-release build_time=319 run_time=88
intel-17.0.1-OpenMP-hwloc-release build_time=461 run_time=58
intel-17.0.1-OpenMP-release build_time=507 run_time=58

spot-check-tpls

#######################################################
PASSED TESTS
#######################################################
clang-4.0.1-Pthread_Serial-hwloc-release build_time=430 run_time=308
clang-4.0.1-Pthread_Serial-release build_time=345 run_time=182
clang-7.0.1-Cuda_OpenMP-hwloc-release build_time=1253 run_time=251
clang-7.0.1-Cuda_OpenMP-release build_time=1248 run_time=247
cuda-9.2-Cuda_OpenMP-release build_time=1312 run_time=282
gcc-5.3.0-OpenMP-hwloc-release build_time=245 run_time=86
gcc-5.3.0-OpenMP-release build_time=244 run_time=99
gcc-7.3.0-Serial-hwloc-release build_time=202 run_time=67
gcc-7.3.0-Serial-release build_time=205 run_time=73
intel-16.0.3-Pthread-hwloc-release build_time=374 run_time=99
intel-16.0.3-Pthread-release build_time=306 run_time=74
intel-16.0.3-Serial-hwloc-release build_time=317 run_time=87
intel-16.0.3-Serial-release build_time=315 run_time=85
intel-17.0.1-OpenMP-hwloc-release build_time=464 run_time=55
intel-17.0.1-OpenMP-release build_time=476 run_time=54

lucbv · 2020-05-24T02:20:32Z

I do not have access to kokkos-dev2 yet so if someone feels like doing:
git fetch upstream pull/725/head spmv_merge
to get a local copy on kokkos-dev2 and running the spot-check there, that would be a great help moving this forward.

kyungjoo-kim

With runtime control parameters, do you have a rough idea of how much overhead is added for default algorihtm ? I do not mean the time increased by the map indexing(this should be trivial). I mean unexpected time increase which might be due to binery size increase and so on.

Is unordered map better than map ? When the directory size is small (like the control object), the indexing time is hardly different and the std::unordered_map could have a larger default hash map size (I do not know what is default hash map size).

brian-kelley · 2020-05-26T19:21:27Z

@lucbv I'm running spot check on kokkos-dev2 now, and I'll get on the Tpetra integration this week.

brian-kelley · 2020-05-26T20:47:52Z

@lucbv Everything's clean, except for one build with a broken configuration:

#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=613 run_time=132
clang-8.0-Pthread_Serial-release build_time=200 run_time=110
clang-9.0.0-Pthread-release build_time=119 run_time=50
clang-9.0.0-Serial-release build_time=126 run_time=57
cuda-10.1-Cuda_OpenMP-release build_time=787 run_time=128
gcc-4.8.4-OpenMP-release build_time=118 run_time=59
gcc-7.3.0-OpenMP-release build_time=140 run_time=57
gcc-7.3.0-Pthread-release build_time=118 run_time=53
gcc-8.3.0-Serial-release build_time=137 run_time=58
gcc-9.1-OpenMP-release build_time=175 run_time=57
gcc-9.1-Serial-release build_time=153 run_time=58
intel-17.0.1-Serial-release build_time=245 run_time=57
intel-18.0.5-OpenMP-release build_time=354 run_time=54
intel-19.0.5-Pthread-release build_time=421 run_time=48
#######################################################
FAILED TESTS
#######################################################
cuda-9.2-Cuda_Serial-release (build failed)
#######################################################

Here's what the failing one printed:

-- Check for working CXX compiler: /ascldap/users/bmkelle/Testing725/kokkos/bin/nvcc_wrapper -- broken
CMake Error at /net/watson.sandia.gov/storage/fast/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.12.2/share/cmake-3.12/Modules/CMakeTestCXXCompiler.cmake:45 (message):
  The C++ compiler

    "/ascldap/users/bmkelle/Testing725/kokkos/bin/nvcc_wrapper"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /ascldap/users/bmkelle/Testing725/Testing/TestAll_2020-05-26_12.20.58/cuda/9.2/Cuda_Serial-release/kokkos-install/CMakeFiles/CMakeTmp

    Run Build Command:"/usr/bin/gmake" "cmTC_5d19c/fast"
    /usr/bin/gmake -f CMakeFiles/cmTC_5d19c.dir/build.make CMakeFiles/cmTC_5d19c.dir/build
    gmake[1]: Entering directory `/home/bmkelle/Testing725/Testing/TestAll_2020-05-26_12.20.58/cuda/9.2/Cuda_Serial-release/kokkos-install/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_5d19c.dir/testCXXCompiler.cxx.o
    /ascldap/users/bmkelle/Testing725/kokkos/bin/nvcc_wrapper    -O3 -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized    -o CMakeFiles/cmTC_5d19c.dir/testCXXCompiler.cxx.o -c /ascldap/users/bmkelle/Testing725/Testing/TestAll_2020-05-26_12.20.58/cuda/9.2/Cuda_Serial-release/kokkos-install/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    sh: cicc: command not found
    gmake[1]: *** [CMakeFiles/cmTC_5d19c.dir/testCXXCompiler.cxx.o] Error 127
    gmake[1]: Leaving directory `/home/bmkelle/Testing725/Testing/TestAll_2020-05-26_12.20.58/cuda/9.2/Cuda_Serial-release/kokkos-install/CMakeFiles/CMakeTmp'
    gmake: *** [cmTC_5d19c/fast] Error 2

So maybe something broke in the module loading for that build. Anyway, that's not a problem with these changes.

brian-kelley · 2020-05-26T20:50:06Z

@ndellingwood Do you know what cicc is? As far as I can tell, the modules are OK ("which nvcc" points to /projects/sems/install/rhel7-x86_64/sems/compiler/cuda/9.2/base/bin which is correct). But "which cicc" doesn't find anything.

ndellingwood · 2020-05-26T20:56:29Z

@brian-kelley No, I'm not sure what that is, something messed up with the module it seems. I just logged onto kokkos-dev-2 and tried to compile a simple cuda driver code with sems-cuda/9.2 module and had the same error:

cuda_device.cu

[ndellin@kokkos-dev-2 CudaDevice]$ cat cuda_device.cu 
#include <cstdio>

// Load modules
//   module load sems-cuda/9.2

// Compilation line:
//   nvcc cuda_device.cu

// Run code:
//   ./a.out

// Expected output:
// Device names and ids...

// See https://devblogs.nvidia.com/how-query-device-properties-and-handle-errors-cuda-cc/

int main() {
  int nDevices;

  cudaGetDeviceCount(&nDevices);
  for (int i = 0; i < nDevices; i++) {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, i);
    printf("Device Number: %d\n", i);
    printf("  Device name: %s\n", prop.name);
  }
}

Load module

module load sems-cuda/9.2

Compilation attempt:

[ndellin@kokkos-dev-2 CudaDevice]$ nvcc cuda_device.cu 
sh: cicc: command not found

ndellingwood · 2020-05-26T20:59:30Z

@brian-kelley swapping 9.2 for 10.1 and I had no issues compiling as expected since the 10.1 tests you ran passed. I'll test my small driver code on another system with the sems-cuda/9.2 module to determine if it's the module or some issue on kokkos-dev-2 with the module and email to request a fix.

ndellingwood · 2020-05-26T21:01:44Z

@brian-kelley the driver compiled fine on kokkos-dev with sems-cuda/9.2. Let me email a sys admin for help.

lucbv · 2020-05-26T21:12:35Z

@brian-kelley @ndellingwood
unfortunate event but at least not impacting, I am also happy to see that cuda 10.1 worked, I was using a similar version on Lassen but not exactly the same so I was not 100% sure.

brian-kelley · 2020-06-04T16:06:47Z

@lucbv I reran spot checks on kokkos-dev2 since the CUDA 9 seems OK now:
Regular spot-check:

#######################################################
PASSED TESTS 
#######################################################
clang-8.0-Pthread_Serial-release build_time=214 run_time=105
clang-9.0.0-Pthread-release build_time=133 run_time=53
clang-9.0.0-Serial-release build_time=135 run_time=53
gcc-4.8.4-OpenMP-release build_time=111 run_time=49
gcc-7.3.0-OpenMP-release build_time=136 run_time=50
gcc-7.3.0-Pthread-release build_time=154 run_time=44
gcc-8.3.0-Serial-release build_time=134 run_time=47
gcc-9.1-OpenMP-release build_time=165 run_time=51
gcc-9.1-Serial-release build_time=152 run_time=48
#######################################################
FAILED TESTS 
#######################################################
clang-8.0-Cuda_OpenMP-release (test failed)
cuda-10.1-Cuda_OpenMP-release (test failed)
cuda-9.2-Cuda_Serial-release (test failed)
intel-17.0.1-Serial-release (test failed)
intel-18.0.5-OpenMP-release (test failed)
intel-19.0.5-Pthread-release (test failed)

Failures were all #715 .

TPLs spot check:

#######################################################
PASSED TESTS
#######################################################
clang-8.0-Pthread_Serial-release build_time=205 run_time=115
clang-9.0.0-Pthread-release build_time=126 run_time=58
clang-9.0.0-Serial-release build_time=123 run_time=59
cuda-10.1-Cuda_OpenMP-release build_time=1103 run_time=1801
gcc-4.8.4-OpenMP-release build_time=122 run_time=56
gcc-7.3.0-OpenMP-release build_time=144 run_time=58
gcc-7.3.0-Pthread-release build_time=117 run_time=52
gcc-8.3.0-Serial-release build_time=141 run_time=61
gcc-9.1-OpenMP-release build_time=182 run_time=57
gcc-9.1-Serial-release build_time=168 run_time=57
intel-17.0.1-Serial-release build_time=254 run_time=57
intel-18.0.5-OpenMP-release build_time=361 run_time=87
intel-19.0.5-Pthread-release build_time=478 run_time=55
#######################################################
FAILED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release (test failed)

I didn't run this one with float. The only failure was actually gauss_seidel timing out on OpenMP (that will be fixed soon).

Since this PR looks good in testing, is there anything else blocking it?

lucbv added 7 commits May 20, 2020 21:38

Controls: first cut that implements a path for cusparse spmv merge alg

e48d61b

SpMV merge: started adding infrastructure for test.

77cda8c

Something bugs out in the cmake logic. Could be a Lassen issue too.

SpMV merge: adding begining of unit-test

2372609

Tested implementation of Controls to allow spmv merge algorithm

3251464

This allows the cusparse specialization layer to use either the default or the merge algorithm depending on input parameters from the user.

Remove unfortunate debugging left-over...

049d943

Adding CMake guard for spmv merge performance test

3d87887

This test needs to be guarded as the SpMV merge algorithm is only available in cuSPARSE, not natively in Kokkos-Kernels at this point.

Fixing some small issues and expanding a bit capabilities in SpMV

d857850

Fixing a name clashing in sptrsv_cuSPARSE: error_t which is usually used as a type not a variable. Adding isParamter method to Controls. Listing more parameters as controled parameters in SpMV

lucbv requested review from brian-kelley, ndellingwood, kyungjoo-kim and srajama1 May 22, 2020 18:46

lucbv self-assigned this May 22, 2020

lucbv added the enhancement label May 22, 2020

lucbv changed the title ~~Kokkos::Controls and cusparse merge algorithm, see #670~~ Kokkos::Controls and cusparse merge algorithm, see issue #670 May 22, 2020

brian-kelley requested changes May 22, 2020

View reviewed changes

Fixing a header and some tpl management.

bca2dda

brian-kelley self-requested a review May 22, 2020 22:57

For some reasons the header was not fixed in the previous commit...

f7d9bf0

brian-kelley approved these changes May 22, 2020

View reviewed changes

brian-kelley mentioned this pull request May 22, 2020

Support complex_float, complex_double in cuSPARSE SPMV wrapper #726

Closed

kyungjoo-kim approved these changes May 26, 2020

View reviewed changes

ndellingwood mentioned this pull request May 27, 2020

cusparse SpMV merge algorithm #670

Closed

brian-kelley merged commit cb30ac0 into kokkos:develop Jun 9, 2020

brian-kelley mentioned this pull request Jun 22, 2020

Tpetra: integrate cuSPARSE merge path spmv trilinos/Trilinos#7569

Merged

lucbv deleted the Controls branch August 31, 2022 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kokkos::Controls and cusparse merge algorithm, see issue #670 #725

Kokkos::Controls and cusparse merge algorithm, see issue #670 #725

lucbv commented May 22, 2020

lucbv commented May 22, 2020

lucbv commented May 22, 2020

brian-kelley left a comment

brian-kelley May 22, 2020

lucbv May 22, 2020

brian-kelley May 22, 2020

lucbv May 22, 2020

brian-kelley May 22, 2020

lucbv May 22, 2020

lucbv commented May 24, 2020

lucbv commented May 24, 2020

kyungjoo-kim left a comment

brian-kelley commented May 26, 2020

brian-kelley commented May 26, 2020

brian-kelley commented May 26, 2020

ndellingwood commented May 26, 2020 •

edited

Loading

ndellingwood commented May 26, 2020

ndellingwood commented May 26, 2020

lucbv commented May 26, 2020

brian-kelley commented Jun 4, 2020

Kokkos::Controls and cusparse merge algorithm, see issue #670 #725

Kokkos::Controls and cusparse merge algorithm, see issue #670 #725

Conversation

lucbv commented May 22, 2020

Brief content

Customers

Details on merge algorithm backend

Details on Controls

lucbv commented May 22, 2020

lucbv commented May 22, 2020

kokkos-dev: spot-check

kokkos-dev: spot-check-tpls

brian-kelley left a comment

Choose a reason for hiding this comment

brian-kelley May 22, 2020

Choose a reason for hiding this comment

lucbv May 22, 2020

Choose a reason for hiding this comment

brian-kelley May 22, 2020

Choose a reason for hiding this comment

lucbv May 22, 2020

Choose a reason for hiding this comment

brian-kelley May 22, 2020

Choose a reason for hiding this comment

lucbv May 22, 2020

Choose a reason for hiding this comment

lucbv commented May 24, 2020

spot-check

spot-check-tpls

lucbv commented May 24, 2020

kyungjoo-kim left a comment

Choose a reason for hiding this comment

brian-kelley commented May 26, 2020

brian-kelley commented May 26, 2020

brian-kelley commented May 26, 2020

ndellingwood commented May 26, 2020 • edited Loading

ndellingwood commented May 26, 2020

ndellingwood commented May 26, 2020

lucbv commented May 26, 2020

brian-kelley commented Jun 4, 2020

ndellingwood commented May 26, 2020 •

edited

Loading