-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup spadd (#685, #694) #773
Conversation
-remove vector/team size stuff from handle, because it only uses range policy -remove "get_max_result_nnz", should only use "get_c_nnz" now. Max is a bad name for what this is, since C always has exactly this many nonzeros, it's not an upper bound. -address kokkos#694 (don't require values to be initialized) -address kokkos#685 (produce a fully sorted and merged C even if A and/or B aren't merged) -improve testing: test matrix with zero rows per entry, and test matrix with duplicate entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I just have one comment regarding value initialization.
just in case filling with 0 doesn't work for UQ/PCE types (they probably do but I didn't try it)
Still good after making the 1-line change: ####################################################### |
Remove some deprecated stuff that no longer exists/works after kokkos#773
Make compatible with kokkos-kernels changes in PR kokkos/kokkos-kernels#773
Remove some deprecated stuff that no longer exists/works after kokkos#773
Note that these are all just enhancements, not bugfixes. I noticed these while trying to track down an EMPIRE issue, where add's MergeEntriesFunctor is crashing after my recent cuSPARSE TPL changes. I think the underlying cause of the bug is somewhere else since cuSPARSE SpMV is completely unrelated to add.
Testing:
#######################################################
PASSED TESTS
#######################################################
clang-8.0-Pthread_Serial-release build_time=336 run_time=173
clang-9.0.0-Pthread-release build_time=136 run_time=77
clang-9.0.0-Serial-release build_time=174 run_time=64
cuda-10.1-Cuda_OpenMP-release build_time=848 run_time=167
cuda-9.2-Cuda_Serial-release build_time=794 run_time=210
gcc-4.8.4-OpenMP-release build_time=117 run_time=64
gcc-7.3.0-OpenMP-release build_time=179 run_time=2532
gcc-7.3.0-Pthread-release build_time=132 run_time=80
gcc-8.3.0-Serial-release build_time=206 run_time=66
gcc-9.1-OpenMP-release build_time=190 run_time=392
gcc-9.1-Serial-release build_time=175 run_time=67
intel-17.0.1-Serial-release build_time=263 run_time=59
intel-18.0.5-OpenMP-release build_time=425 run_time=67
intel-19.0.5-Pthread-release build_time=467 run_time=72
clang-8.0-Cuda_OpenMP-release (test failed) <== sparse_openmp timed out due to machine congestion, but I re-ran it today from the same build and it passed
#######################################################