two-level parallel version of transpose GEMV #514

iyamazaki · 2019-12-10T23:47:39Z

No description provided.

mhoemmen

Why do we need to write an optimized GEMV? Why can't we just call cuBLAS for CUDA and the BLAS for non-CUDA, and have a slow fall-back for unsupported types?

mhoemmen · 2019-12-11T17:41:21Z

src/blas/impl/KokkosBlas2_gemv_impl.hpp

+         const bool conj,
+         class IndexType = typename AViewType::size_type>
+struct TwoLevelTransposeGEMV {
+  typedef typename YViewType::non_const_value_type y_value_type;


Prefer using new_type = old_type; C++11 type alias syntax to typedef old_type new_type;.

ndellingwood · 2019-12-11T18:42:17Z

Cross-referencing #443
@mhoemmen I believe the intention of the PR is to resolve issues with the fallback impl in transpose mode, the 2-level parallelism approach seemed a natural way to implement.

mhoemmen · 2019-12-11T20:38:42Z

@ndellingwood wrote:

I believe the intention of the PR is to resolve issues with the fallback impl in transpose mode, the 2-level parallelism approach seemed a natural way to implement.

Cool, I'm OK with it as long as it's fixing a bug.

mhoemmen

Looks OK. The usual kokkos-kernels idiom that Christian has been promoting, is to reduce the number of instantiations of the functor, by having the function that calls the functor assign the input Views to "canonical" View types. For example, the (outer) function might be templated on ViewType, but if the functor really only needs View<const T*, AnonymousSpace>, it would be best to have the functor be templated only on T.

src/blas/impl/KokkosBlas2_gemv_impl.hpp

iyamazaki · 2019-12-13T20:44:48Z

Thank you for the comments @mhoemmen. I am running the spot-check on white, but it is taking a long time @ndellingwood.

ndellingwood · 2019-12-13T23:30:37Z

@iyamazaki spot-check on kokkos-dev-2 passed, merging in. Thanks!

Running on machine: kokkos-dev-2
Going to test compilers:  gcc/7.3.0 gcc/9.1 intel/18.0.5 clang/8.0 cuda/10.1
Testing compiler gcc/7.3.0
Testing compiler gcc/9.1
  Starting job gcc-7.3.0-OpenMP-release
  Starting job gcc-7.3.0-Pthread-release
  PASSED gcc-7.3.0-OpenMP-release
  Starting job gcc-9.1-OpenMP-release
  PASSED gcc-7.3.0-Pthread-release
Testing compiler intel/18.0.5
  Starting job gcc-9.1-Serial-release
  PASSED gcc-9.1-OpenMP-release
Testing compiler clang/8.0
  Starting job intel-18.0.5-OpenMP-release
  PASSED gcc-9.1-Serial-release
  Starting job clang-8.0-Cuda_OpenMP-release
  PASSED intel-18.0.5-OpenMP-release
Testing compiler cuda/10.1
  Starting job clang-8.0-Pthread_Serial-release
  PASSED clang-8.0-Cuda_OpenMP-release
  PASSED clang-8.0-Pthread_Serial-release
  Starting job cuda-10.1-Cuda_OpenMP-release
  PASSED cuda-10.1-Cuda_OpenMP-release
#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=567 run_time=712
clang-8.0-Pthread_Serial-release build_time=356 run_time=883
cuda-10.1-Cuda_OpenMP-release build_time=626 run_time=660
gcc-7.3.0-OpenMP-release build_time=218 run_time=264
gcc-7.3.0-Pthread-release build_time=203 run_time=501
gcc-9.1-OpenMP-release build_time=186 run_time=235
gcc-9.1-Serial-release build_time=157 run_time=486
intel-18.0.5-OpenMP-release build_time=301 run_time=292
#######################################################

two-level parallel version of transpose GEMV

c80d453

mhoemmen reviewed Dec 11, 2019

View reviewed changes

mhoemmen approved these changes Dec 11, 2019

View reviewed changes

src/blas/impl/KokkosBlas2_gemv_impl.hpp Show resolved Hide resolved

iyamazaki added 2 commits December 13, 2019 10:53

comment out the largest unit-test

f23708d

switch from 'typedef' to 'using'

3f7443c

ndellingwood merged commit 8849ecb into kokkos:develop Dec 13, 2019

ndellingwood mentioned this pull request Dec 16, 2019

gemv transpose mode issues with cublas and M == 0 #539

Closed

ndellingwood mentioned this pull request Jan 28, 2020

Kokkos + KokkosKernels Promotion To 2.9.99 trilinos/Trilinos#6671

Merged

ndellingwood mentioned this pull request Mar 10, 2020

Kokkos Blas: gemv segfaults #443

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

two-level parallel version of transpose GEMV #514

two-level parallel version of transpose GEMV #514

iyamazaki commented Dec 10, 2019

mhoemmen left a comment

mhoemmen Dec 11, 2019

ndellingwood commented Dec 11, 2019

mhoemmen commented Dec 11, 2019

mhoemmen left a comment

iyamazaki commented Dec 13, 2019

ndellingwood commented Dec 13, 2019

two-level parallel version of transpose GEMV #514

two-level parallel version of transpose GEMV #514

Conversation

iyamazaki commented Dec 10, 2019

mhoemmen left a comment

Choose a reason for hiding this comment

mhoemmen Dec 11, 2019

Choose a reason for hiding this comment

ndellingwood commented Dec 11, 2019

mhoemmen commented Dec 11, 2019

mhoemmen left a comment

Choose a reason for hiding this comment

iyamazaki commented Dec 13, 2019

ndellingwood commented Dec 13, 2019