Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial code path for spmv #893

Merged
merged 8 commits into from
Feb 12, 2021
Merged

Conversation

kyungjoo-kim
Copy link
Contributor

The current spmv with transpose uses atomic updates and follows the generic interface with parallel for. This significantly lowers the performance as seen in trilinos/Trilinos#8658.

This PR address the performance issue of serial implementation of spmv.

@kyungjoo-kim
Copy link
Contributor Author

kyungjoo-kim commented Feb 10, 2021

I heard that we now have auto tester. The following spotcheck result is just for reference.

/// blake
#######################################################
PASSED TESTS
#######################################################
gcc-7.2.0-OpenMP-release build_time=200 run_time=63
gcc-7.2.0-Pthread_Serial-release build_time=304 run_time=164
intel-19.1.144-OpenMP_Serial-release build_time=1224 run_time=132

/// weaver
#######################################################
PASSED TESTS
#######################################################
cuda-10.1.243-Cuda_OpenMP-release build_time=1243 run_time=154
cuda-9.2.88-Cuda_Serial-release build_time=1207 run_time=254
gcc-6.4.0-OpenMP_Serial-release build_time=388 run_time=197
gcc-7.2.0-OpenMP-release build_time=247 run_time=68
gcc-7.2.0-OpenMP_Serial-release build_time=391 run_time=195
gcc-7.2.0-Serial-release build_time=225 run_time=66

@kokkos-devops-admin
Copy link

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@kokkos-devops-admin
Copy link

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: KokkosKernels_PullRequest_GCC720

  • Build Num: 32
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720

  • Build Num: 25
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_INTEL18

  • Build Num: 11
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_GCC720_Light

  • Build Num: 53
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA10

  • Build Num: 17
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA9

  • Build Num: 11
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740

  • Build Num: 11
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Using Repos:

Repo: KOKKOSKERNELS (kyungjoo-kim/kokkos-kernels)
  • Branch: kyukim-develop
  • SHA: a474e62
  • Mode: TEST_REPO

Pull Request Author: kyungjoo-kim

@kokkos-devops-admin
Copy link

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: KokkosKernels_PullRequest_GCC720

  • Build Num: 32
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720

  • Build Num: 25
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_INTEL18

  • Build Num: 11
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_GCC720_Light

  • Build Num: 53
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA10

  • Build Num: 17
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_CUDA9

  • Build Num: 11
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

Build Information

Test Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740

  • Build Num: 11
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
KOKKOSKERNELS_SOURCE_BRANCH kyukim-develop
KOKKOSKERNELS_SOURCE_REPO https://github.com/kyungjoo-kim/kokkos-kernels
KOKKOSKERNELS_SOURCE_SHA a474e62
KOKKOSKERNELS_TARGET_BRANCH develop
KOKKOSKERNELS_TARGET_REPO https://github.com/kokkos/kokkos-kernels
KOKKOSKERNELS_TARGET_SHA 717e67f
PULLREQUESTNUM 893
TEST_REPO_ALIAS KOKKOSKERNELS

@kokkos-devops-admin
Copy link

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO REVIEWS HAVE BEEN PERFORMED ON THIS PULL REQUEST!

@kokkos-devops-admin
Copy link

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, I just have a few minor comments about the style I guess and performance.
Any reason for using 4 tmp rather than something else, I am guessing this gave the best performance?

const ordinal_type *__restrict__ col_idx_ptr = A.graph.entries.data();
const value_type *__restrict__ values_ptr = A.values.data();

typename YVector::non_const_value_type *__restrict__ y_ptr = y.data();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since I already spotted it a few times, maybe add something like:
using output_value_type = typename YVector::non_const_value_type

typename YVector::non_const_value_type *__restrict__ y_ptr = y.data();
typename XVector::const_value_type *__restrict__ x_ptr = x.data();

const typename YVector::non_const_value_type zero(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think there is anything wrong with this especially since it is on the host side, but in general using:
const output_value_type zero = Kokkos::ArithTraits<output_value_type>::zero()
can be more robust but that's no big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that this is just a style. ; )

if (alpha == zero) {
if (dobeta == 0) {
memset(y_ptr, 0, sizeof(typename YVector::value_type)*nrow);
} else if (dobeta == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not check for dobeta == -1, it would probably be a bit faster to do it with -= operator than with *= operator?
Again not a big deal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case with alpha = zero. When beta is any number, it is scaling. -= cannot be used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess I should have said, y = -y, I am actually hoping that the compiler can detect that alpha is equal to -1 and can optimize this but again I do not think it is critical here. I think alpha=-1 is a more common case because of residual calculations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment in the below.

@kokkos-devops-admin
Copy link

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES

@kokkos-devops-admin
Copy link

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@kyungjoo-kim
Copy link
Contributor Author

kyungjoo-kim commented Feb 10, 2021

@lucbv The performance is in paricular slower on power architecture comparing with intel. Intel already perform reasonably well without this trick. The power architecture vectorization has 128 bit length and I just give possibility of vectorization with latency hiding by feeding 2x food. This kind of optimization should be done by a compiler. With advance in compiler optimization, the performance can be improved without hand optimization like this (that will be the best at the end).

The performance depends on the sparsity of the matrix. I do not really expect performance with this trick but some problems on a coarse domain is quite dense in the application. It just performs better. If there is no critical concern, would you approve and include this in the code ? Earlier integration with Trilinos would also help for the application level testing.

Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kokkos-devops-admin
Copy link

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]!

@kokkos-devops-admin
Copy link

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge

if (dobeta == 0) {
y_ptr[i] = alpha*(tmp1 + tmp2 + tmp3 + tmp4);
} else if (dobeta == -1) {
y_ptr[i] -= alpha*(tmp1 + tmp2 + tmp3 + tmp4);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucbv This is when alpha = -1.

@kokkos-devops-admin
Copy link

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge

1 similar comment
@kokkos-devops-admin
Copy link

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge

@kyungjoo-kim
Copy link
Contributor Author

could we merge this ?

@lucbv
Copy link
Contributor

lucbv commented Feb 12, 2021

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants