Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sptrsv: Add tplchain and cusparse #555

Merged
merged 3 commits into from
Dec 20, 2019

Conversation

ndellingwood
Copy link
Contributor

No description provided.

@ndellingwood
Copy link
Contributor Author

Spot-check on kokkos-dev-2: Failure with Cuda_OpenMP, currently testing fix

Running on machine: kokkos-dev-2
Going to test compilers:  gcc/7.3.0 gcc/9.1 intel/18.0.5 clang/8.0 cuda/10.1
...
#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=595 run_time=515
clang-8.0-Pthread_Serial-release build_time=242 run_time=336
gcc-7.3.0-OpenMP-release build_time=224 run_time=174
gcc-7.3.0-Pthread-release build_time=209 run_time=176
gcc-9.1-OpenMP-release build_time=257 run_time=173
gcc-9.1-Serial-release build_time=235 run_time=174
intel-18.0.5-OpenMP-release build_time=435 run_time=158
#######################################################
FAILED TESTS
#######################################################
cuda-10.1-Cuda_OpenMP-release (test failed)

@mhoemmen
Copy link
Contributor

How does this relate to #552 ? Are they meant to happen together or should they be sequenced?

@ndellingwood
Copy link
Contributor Author

@mhoemmen they can be sequenced, I'll have to fix merge conflicts when they're ready for merge but the functionalities are independent of one another.

@ndellingwood
Copy link
Contributor Author

I'm going to have to push an extra change shortly here, getting the cudagraph stuff to work with Kokkos is more fragile than I anticipated, see kokkos/kokkos#2606

@ndellingwood
Copy link
Contributor Author

I'll push the cudagraph fix in a moment when the test finishes.

kh.get_sptrsv_handle()->print_algorithm();
break;
*/
#ifdef KOKKOSKERNELS_ENABLE_TPL_CUSPARSE
case CUSPARSE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndellingwood could you please remind me what different between CUSPARSE and CUSPARSE_K?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - CUSPARSE_K is the kokkos-kernels supported version I added in the PR, CUSPARSE is the hand-rolled implementation within the perf test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do CUSPARSE_K and CUSPARSE compare in terms of perf? Should we still need to keep CUSPARSE case in the perf test when we already have CUSPARSE_K?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't compared carefully, they should be nearly the same. It may be interesting to keep for comparison, but I can also remove to clean up the perf test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand. It is up to you to remove it or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vqd8a for the catch, I'll leave it in for now to comparison test but will clean up in a PR shortly after.

cudagraph support paired with Kokkos usage seems to be somewhat fragile
keeping the code so it can be re-enabled as default once resolved
@ndellingwood
Copy link
Contributor Author

Rerun cuda tests on kokkos-dev-2:

Running on machine: kokkos-dev-2
Going to test compilers:  cuda/10.1
Testing compiler cuda/10.1
  Starting job cuda-10.1-Cuda_OpenMP-release
  PASSED cuda-10.1-Cuda_OpenMP-release
#######################################################
PASSED TESTS
#######################################################
cuda-10.1-Cuda_OpenMP-release build_time=663 run_time=478
#######################################################

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants