-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf tests cleanup #613
perf tests cleanup #613
Conversation
- Cleaning up duplicated MatrixMarket code from perf_test/spmv that exists in IOUtils (kokkos#493) - Changing the scalar/lno_t/size_type/layout to tolerate any ETI combination (previously, only double/int/int/left was really supported)
When doing dot(View<float*>, View<float*>) -> float (or with complex<float>*), dot uses double (or complex<double>) to actually sum the values. This wasn't getting ETI'd correctly, but now perf_test and unit_test both build correctly with float enabled as a scalar.
in InnerProductSpaceTraits, when casting complex<double> to complex<float>
instead of the test_common versions of those which are now gone
Last resort if nothing else is enabled, or nothing else was selected at runtime with argc/argv
If "--flag" expects another argument after, check that there is actually another arg before trying to read it.
I also made the SPGEMM perf test run with serial device, if it's enabled and neither CUDA nor OpenMP ran. Before, if you didn't give --openmp or --cuda, it would just print out a bunch of machine info and exit without doing a multiply, but it also didn't give a warning that nothing happened. |
2 TODOs for this PR:
|
@brian-kelley are any modifications related to your first TODO needed before merging this PR, or can that be handled in a separate PR (in which case I'll review for merge so you don't have to wait longer for this to go in)? |
@ndellingwood No, the first TODO is not blocking this PR. It just means that users can't use float in the perf tests (yet) without changing source code themselves. This is fine as @jjwilke is already working on a solution in #620. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brian-kelley !
@ndellingwood Thanks for reviewing, I'll re-run the checks to make sure nothing here broke. |
OK, rebasing this on develop built cleanly with Serial,Cuda,OpenMP on my machine so it's probably fine, but I'm still going to let the spot check frun. |
Spot checks still passed, so I'm merging. |
Lots of cleanup/improvement to perf tests
--xyz
is the last command line arg, but a value is expected after it (e.g. --openmp $numthreads). Instead, print error message and exitAlso fixed an issue with Blas1 dot ETI: when scalar=float, the rank-1 version that returns a scalar should use double to accumulate the sum. This case was not being ETI'd, which broke the perf test build. This is now ETI'd correctly.
This PR resolves #583, #493, and #492.
I was able to build and run the unit and perf tests cleanly with serial, openmp and cuda enabled, and with double, float, complex_double and complex_float all enabled. I was also able to build without scalar=double, meaning float was used in all the tests.
RIDE spot check (with double and complex_double):
#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=578 run_time=418
cuda-10.1.105-Cuda_Serial-release build_time=515 run_time=521
cuda-9.2.88-Cuda_OpenMP-release build_time=504 run_time=425
cuda-9.2.88-Cuda_Serial-release build_time=553 run_time=526
gcc-6.4.0-OpenMP_Serial-release build_time=244 run_time=394
gcc-7.2.0-OpenMP-release build_time=127 run_time=128
gcc-7.2.0-OpenMP_Serial-release build_time=210 run_time=362
gcc-7.2.0-Serial-release build_time=119 run_time=227
ibm-16.1.0-Serial-release build_time=480 run_time=398
Bowman spot check (with double and complex_double):
#######################################################
PASSED TESTS
#######################################################
intel-16.4.258-Pthread-release build_time=733 run_time=1029
intel-16.4.258-Pthread_Serial-release build_time=1065 run_time=1993
intel-16.4.258-Serial-release build_time=715 run_time=949
intel-17.2.174-OpenMP-release build_time=881 run_time=569
intel-17.2.174-OpenMP_Serial-release build_time=1248 run_time=1468
intel-17.2.174-Pthread-release build_time=813 run_time=893
intel-17.2.174-Pthread_Serial-release build_time=1151 run_time=1784
intel-17.2.174-Serial-release build_time=791 run_time=876