Serial code path for spmv #893

kyungjoo-kim · 2021-02-10T00:21:00Z

The current spmv with transpose uses atomic updates and follows the generic interface with parallel for. This significantly lowers the performance as seen in trilinos/Trilinos#8658.

This PR address the performance issue of serial implementation of spmv.

kyungjoo-kim · 2021-02-10T00:26:14Z

I heard that we now have auto tester. The following spotcheck result is just for reference.

/// blake
#######################################################
PASSED TESTS
#######################################################
gcc-7.2.0-OpenMP-release build_time=200 run_time=63
gcc-7.2.0-Pthread_Serial-release build_time=304 run_time=164
intel-19.1.144-OpenMP_Serial-release build_time=1224 run_time=132

/// weaver
#######################################################
PASSED TESTS
#######################################################
cuda-10.1.243-Cuda_OpenMP-release build_time=1243 run_time=154
cuda-9.2.88-Cuda_Serial-release build_time=1207 run_time=254
gcc-6.4.0-OpenMP_Serial-release build_time=388 run_time=197
gcc-7.2.0-OpenMP-release build_time=247 run_time=68
gcc-7.2.0-OpenMP_Serial-release build_time=391 run_time=195
gcc-7.2.0-Serial-release build_time=225 run_time=66

kokkos-devops-admin · 2021-02-10T00:35:58Z

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

kokkos-devops-admin · 2021-02-10T00:37:14Z

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: KokkosKernels_PullRequest_GCC720

Build Num: 32
Status: STARTED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720

Build Num: 25
Status: STARTED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_INTEL18

Build Num: 11
Status: STARTED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_GCC720_Light

Build Num: 53
Status: STARTED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA10

Build Num: 17
Status: STARTED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA9

Build Num: 11
Status: STARTED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740

Build Num: 11
Status: STARTED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Using Repos:

Repo: KOKKOSKERNELS (kyungjoo-kim/kokkos-kernels)

Pull Request Author: kyungjoo-kim

kokkos-devops-admin · 2021-02-10T01:13:06Z

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: KokkosKernels_PullRequest_GCC720

Build Num: 32
Status: PASSED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720

Build Num: 25
Status: PASSED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_INTEL18

Build Num: 11
Status: PASSED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_GCC720_Light

Build Num: 53
Status: PASSED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA10

Build Num: 17
Status: PASSED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA9

Build Num: 11
Status: PASSED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740

Build Num: 11
Status: PASSED

Jenkins Parameters

Parameter Name	Value
KOKKOSKERNELS_SOURCE_BRANCH	kyukim-develop
KOKKOSKERNELS_SOURCE_REPO	https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA	`a474e62`
KOKKOSKERNELS_TARGET_BRANCH	develop
KOKKOSKERNELS_TARGET_REPO	https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA	`717e67f`
PULLREQUESTNUM	893
TEST_REPO_ALIAS	KOKKOSKERNELS

kokkos-devops-admin · 2021-02-10T01:13:23Z

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO REVIEWS HAVE BEEN PERFORMED ON THIS PULL REQUEST!

kokkos-devops-admin · 2021-02-10T01:13:29Z

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

lucbv

This looks fine to me, I just have a few minor comments about the style I guess and performance.
Any reason for using 4 tmp rather than something else, I am guessing this gave the best performance?

lucbv · 2021-02-10T15:13:41Z

src/sparse/impl/KokkosSparse_spmv_impl.hpp

+    const ordinal_type *__restrict__ col_idx_ptr = A.graph.entries.data();
+    const value_type *__restrict__ values_ptr = A.values.data();
+
+    typename YVector::non_const_value_type *__restrict__  y_ptr = y.data();


since I already spotted it a few times, maybe add something like:
using output_value_type = typename YVector::non_const_value_type

lucbv · 2021-02-10T15:16:32Z

src/sparse/impl/KokkosSparse_spmv_impl.hpp

+    typename YVector::non_const_value_type *__restrict__  y_ptr = y.data();
+    typename XVector::const_value_type *__restrict__  x_ptr = x.data();
+
+    const typename YVector::non_const_value_type zero(0);


I do not think there is anything wrong with this especially since it is on the host side, but in general using:
const output_value_type zero = Kokkos::ArithTraits<output_value_type>::zero()
can be more robust but that's no big deal.

i think that this is just a style. ; )

lucbv · 2021-02-10T15:19:55Z

src/sparse/impl/KokkosSparse_spmv_impl.hpp

+    if (alpha == zero) {
+      if (dobeta == 0) {
+        memset(y_ptr, 0, sizeof(typename YVector::value_type)*nrow);
+      } else if (dobeta == 1) {


Why not check for dobeta == -1, it would probably be a bit faster to do it with -= operator than with *= operator?
Again not a big deal

This is a case with alpha = zero. When beta is any number, it is scaling. -= cannot be used here.

Yes, I guess I should have said, y = -y, I am actually hoping that the compiler can detect that alpha is equal to -1 and can optimize this but again I do not think it is critical here. I think alpha=-1 is a more common case because of residual calculations.

See the comment in the below.

kokkos-devops-admin · 2021-02-10T15:29:36Z

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES

kokkos-devops-admin · 2021-02-10T15:29:42Z

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

kyungjoo-kim · 2021-02-10T17:24:29Z

@lucbv The performance is in paricular slower on power architecture comparing with intel. Intel already perform reasonably well without this trick. The power architecture vectorization has 128 bit length and I just give possibility of vectorization with latency hiding by feeding 2x food. This kind of optimization should be done by a compiler. With advance in compiler optimization, the performance can be improved without hand optimization like this (that will be the best at the end).

The performance depends on the sparsity of the matrix. I do not really expect performance with this trick but some problems on a coarse domain is quite dense in the application. It just performs better. If there is no critical concern, would you approve and include this in the code ? Earlier integration with Trilinos would also help for the application level testing.

lucbv

lgtm

kokkos-devops-admin · 2021-02-10T17:35:11Z

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]!

kokkos-devops-admin · 2021-02-10T17:35:18Z

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge

kyungjoo-kim · 2021-02-10T17:35:56Z

src/sparse/impl/KokkosSparse_spmv_impl.hpp

+          if (dobeta == 0) {
+            y_ptr[i] = alpha*(tmp1 + tmp2 + tmp3 + tmp4);
+          } else if (dobeta == -1) {
+            y_ptr[i] -= alpha*(tmp1 + tmp2 + tmp3 + tmp4);


@lucbv This is when alpha = -1.

kokkos-devops-admin · 2021-02-11T17:36:11Z

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge

kokkos-devops-admin · 2021-02-12T17:38:33Z

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge

kyungjoo-kim · 2021-02-12T17:47:39Z

could we merge this ?

lucbv · 2021-02-12T18:17:33Z

Sure

kyungjoo-kim added 8 commits February 9, 2021 11:36

KokkosSparse - implement serial code path for sierra

28ca2a9

KokkosSparse - remove warning unused variables

15bb370

KokkosSparse - missing return

2c175fe

KokkosSparse - non const ordianl

b99d242

KokkosSparse - alpha is not accounted

6ef3662

KokkosSparse - coefficient alpha beta should be declared as yVector type

695e5bc

KokkosSparse - somehow I need to give a special code path for dobeta

28cd041

KokkosSparse - spot check pass

a474e62

kyungjoo-kim requested review from brian-kelley, lucbv and ndellingwood February 10, 2021 00:21

kyungjoo-kim self-assigned this Feb 10, 2021

kyungjoo-kim added the enhancement label Feb 10, 2021

kyungjoo-kim mentioned this pull request Feb 10, 2021

KokkosKernels: poor performance for transpose SpMV with tall skinny matrices on host trilinos/Trilinos#8658

Closed

lucbv reviewed Feb 10, 2021

View reviewed changes

lucbv approved these changes Feb 10, 2021

View reviewed changes

kyungjoo-kim commented Feb 10, 2021

View reviewed changes

lucbv merged commit 3634bfc into kokkos:develop Feb 12, 2021

brian-kelley mentioned this pull request Feb 16, 2021

Improve performance of serial transpose SpMV #887

Closed

lucbv mentioned this pull request Feb 17, 2021

auto-tester did not catch error on serial build #897

Closed

ndellingwood mentioned this pull request Apr 28, 2021

Kokkos: Nalu diffs due to recent commit trilinos/Trilinos#9070

Closed

kokkos-devops-admin mentioned this pull request Sep 14, 2023

Remove old half-precision arith traits #1774

Closed

Serial code path for spmv #893

Serial code path for spmv #893

Conversation

kyungjoo-kim commented Feb 10, 2021

kyungjoo-kim commented Feb 10, 2021 • edited Loading

kokkos-devops-admin commented Feb 10, 2021

kokkos-devops-admin commented Feb 10, 2021

Build Information

Test Name: KokkosKernels_PullRequest_GCC720

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_INTEL18

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_GCC720_Light

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA10

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA9

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740

Jenkins Parameters

Using Repos:

kokkos-devops-admin commented Feb 10, 2021

Build Information

Test Name: KokkosKernels_PullRequest_GCC720

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_INTEL18

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_GCC720_Light

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA10

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA9

Jenkins Parameters

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740

Jenkins Parameters

kokkos-devops-admin commented Feb 10, 2021

kokkos-devops-admin commented Feb 10, 2021

lucbv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokkos-devops-admin commented Feb 10, 2021

kokkos-devops-admin commented Feb 10, 2021

kyungjoo-kim commented Feb 10, 2021 • edited Loading

lucbv left a comment

Choose a reason for hiding this comment

kokkos-devops-admin commented Feb 10, 2021

kokkos-devops-admin commented Feb 10, 2021

Choose a reason for hiding this comment

kokkos-devops-admin commented Feb 11, 2021

kokkos-devops-admin commented Feb 12, 2021

kyungjoo-kim commented Feb 12, 2021

lucbv commented Feb 12, 2021

kyungjoo-kim commented Feb 10, 2021 •

edited

Loading

kyungjoo-kim commented Feb 10, 2021 •

edited

Loading