Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf tests cleanup #613

Merged
merged 14 commits into from
Feb 28, 2020
Merged

Conversation

brian-kelley
Copy link
Contributor

Lots of cleanup/improvement to perf tests

  • Remove MyCrsMatrix (instead, using KokkosSparse::CrsMatrix)
  • Remove spmv/matrix_market.hpp (instead, using the IO utils)
  • Make spgemm perf test not crash if --xyz is the last command line arg, but a value is expected after it (e.g. --openmp $numthreads). Instead, print error message and exit
  • Make all perf tests use whatever scalar/ordinal/offset/layout types are enabled by ETI, rather than hardcoding double/int/int/left.
  • Cleaned up lots of unused parameters in the various raw OpenMP spmv versions
  • Added Ordinal/Offset (aka lno_t vs. size_type) distinction to the raw spmv versions

Also fixed an issue with Blas1 dot ETI: when scalar=float, the rank-1 version that returns a scalar should use double to accumulate the sum. This case was not being ETI'd, which broke the perf test build. This is now ETI'd correctly.

This PR resolves #583, #493, and #492.

I was able to build and run the unit and perf tests cleanly with serial, openmp and cuda enabled, and with double, float, complex_double and complex_float all enabled. I was also able to build without scalar=double, meaning float was used in all the tests.

RIDE spot check (with double and complex_double):
#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=578 run_time=418
cuda-10.1.105-Cuda_Serial-release build_time=515 run_time=521
cuda-9.2.88-Cuda_OpenMP-release build_time=504 run_time=425
cuda-9.2.88-Cuda_Serial-release build_time=553 run_time=526
gcc-6.4.0-OpenMP_Serial-release build_time=244 run_time=394
gcc-7.2.0-OpenMP-release build_time=127 run_time=128
gcc-7.2.0-OpenMP_Serial-release build_time=210 run_time=362
gcc-7.2.0-Serial-release build_time=119 run_time=227
ibm-16.1.0-Serial-release build_time=480 run_time=398

Bowman spot check (with double and complex_double):
#######################################################
PASSED TESTS
#######################################################
intel-16.4.258-Pthread-release build_time=733 run_time=1029
intel-16.4.258-Pthread_Serial-release build_time=1065 run_time=1993
intel-16.4.258-Serial-release build_time=715 run_time=949
intel-17.2.174-OpenMP-release build_time=881 run_time=569
intel-17.2.174-OpenMP_Serial-release build_time=1248 run_time=1468
intel-17.2.174-Pthread-release build_time=813 run_time=893
intel-17.2.174-Pthread_Serial-release build_time=1151 run_time=1784
intel-17.2.174-Serial-release build_time=791 run_time=876

brian-kelley and others added 14 commits February 19, 2020 13:50
- Cleaning up duplicated MatrixMarket code from
  perf_test/spmv that exists in IOUtils (kokkos#493)
- Changing the scalar/lno_t/size_type/layout to tolerate any ETI
  combination (previously, only double/int/int/left was really
  supported)
When doing dot(View<float*>, View<float*>) -> float (or with
complex<float>*), dot uses double (or complex<double>) to actually
sum the values. This wasn't getting ETI'd correctly, but now
perf_test and unit_test both build correctly with float enabled as a
scalar.
in InnerProductSpaceTraits, when casting complex<double> to complex<float>
instead of the test_common versions of those which are now gone
Last resort if nothing else is enabled, or nothing else was selected
at runtime with argc/argv
If "--flag" expects another argument after, check that there is actually
another arg before trying to read it.
@brian-kelley
Copy link
Contributor Author

I also made the SPGEMM perf test run with serial device, if it's enabled and neither CUDA nor OpenMP ran. Before, if you didn't give --openmp or --cuda, it would just print out a bunch of machine info and exit without doing a multiply, but it also didn't give a warning that nothing happened.

@brian-kelley
Copy link
Contributor Author

2 TODOs for this PR:

  • decide how to handle the fact that the default for each ETI type is unconditionally enabled. This means that it's currently impossible for the user to get a perf test to use scalar=float, without changing source code (see Can I run KokkosKernels spgemm with float or int32 type? #583)
  • re run spot checks to make sure PRs since last week (and ideally, other pending PRs) didn't break this

@ndellingwood
Copy link
Contributor

@brian-kelley are any modifications related to your first TODO needed before merging this PR, or can that be handled in a separate PR (in which case I'll review for merge so you don't have to wait longer for this to go in)?

@brian-kelley
Copy link
Contributor Author

@ndellingwood No, the first TODO is not blocking this PR. It just means that users can't use float in the perf tests (yet) without changing source code themselves. This is fine as @jjwilke is already working on a solution in #620.

Copy link
Contributor

@ndellingwood ndellingwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @brian-kelley !

@brian-kelley
Copy link
Contributor Author

@ndellingwood Thanks for reviewing, I'll re-run the checks to make sure nothing here broke.

@brian-kelley
Copy link
Contributor Author

OK, rebasing this on develop built cleanly with Serial,Cuda,OpenMP on my machine so it's probably fine, but I'm still going to let the spot check frun.

@brian-kelley
Copy link
Contributor Author

Spot checks still passed, so I'm merging.

@brian-kelley brian-kelley merged commit a5a7337 into kokkos:develop Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants