Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: HashmapAccumulator Example #299

Merged
merged 20 commits into from
Oct 11, 2018
Merged

WIP: HashmapAccumulator Example #299

merged 20 commits into from
Oct 11, 2018

Conversation

william76
Copy link
Contributor

@william76 william76 commented Sep 20, 2018

Initial commit of my stab at a HashmapAccumulator example.

This example is trying to implement a simplified version of how I'm using the HashmapAccumulator class in my Distance-2 Graph Coloring work in progress branch.

Initial commit of my stab at a hashmap accumulator example.

This code won't compile b/c the compiler wants something else
from my functor call, but I'm not quite sure what it is right
now... do Kokkos parallel functors only work when called inside
a class?
@william76 william76 changed the title HashmapAccumulator Example Initial WIP: HashmapAccumulator Example Sep 20, 2018
@william76
Copy link
Contributor Author

Currently this won't compile... just putting in the PR so it's accessible.

Fixed the compile error and have the example running.
It doesn't do anything useful but it runs.
@william76 william76 requested a review from srajama1 September 21, 2018 00:55
@william76
Copy link
Contributor Author

Fixed the compile error and I have the example using the hashmap accumulator. The example doesn't do anything useful, but it runs. I've only tried running this on OpenMP so far.

@william76
Copy link
Contributor Author

@srajama1 Here's the first cut of a simple example of the Hashmap_Accumulator

@srajama1
Copy link
Contributor

What do you mean it won't do anything useful so far ? Will it compute the distance-2 degree of a graph correctly ?

@william76
Copy link
Contributor Author

@srajama1 No, not in this example... My understanding is that the basic HashmapAccumulator example you wanted was going to just demonstrate the use of it -- how to set up the external arrays and use it, not to use it for computing the tight bound on distance-2 degree. This code is just calling the function on a list of random numbers to exercise the use of HashmapAccumulator, so nothing terribly exciting. Since we had been discussing putting this example someplace that was not in perf_test/graph I did not think you were asking for a graph-based example.

If you'd like, I can make an example that does the D2 Degree only as well.

@srajama1
Copy link
Contributor

Yep, lets do that.

William McLendon added 3 commits September 21, 2018 14:29
The HashmapAccumulator file was a bit hard to read because of
inconsistent use of indentions using mixed tabs and spaces in
leading whitespace, etc.

This applies only style fixes to remove tabs in leading whitespace
and make some indention and spacing consistent across all of the
methods.
I forgot to reset the Begins array in the example, which prevented
users of the memory chunk after the first iteration from working
properly.

The example is fixed and doing what I'd expect it to do now.
@william76
Copy link
Contributor Author

Ok.

It wasn't compiling on CUDA because I was sending the host view
of the data to the functor in the parallel_for rather than the
device copy on the mirror view.

Added some new command line parameters:
- verbose mode
- number of values to put in the number list
- etc.

Also, I added a little cleanup and a couple of comments.
@william76
Copy link
Contributor Author

@ndellingwood Thanks for helping me find that static assert error I was getting on CUDA builds. It would have taken me a while to draw the connection between that error message and what was going on ;)

William McLendon added 5 commits September 25, 2018 12:30
The `example/` dir has not been set up to actually build any examples
so I also added a Makefile in there and updated the generate_makefile
script to add a new build target: `build-example` which will build the
example directory contents.

I also added an option to the generated makefile for `build-all` which
will launch both `build-test` and `build-example` so we can choose if
we want to build just examples or the tests or both.
@william76
Copy link
Contributor Author

@srajama1
It would be convenient to merge this in if you think it looks good as a minimalistic HashmapAccumulator example. Basically this example takes a list of random numbers and can compute the number of unique numbers in it.

Copy link
Contributor

@srajama1 srajama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment.. otherwise looks ok. Thanks for the example !

Copy link
Contributor

@srajama1 srajama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment .. Looks good. Thanks for the example !

@ndellingwood
Copy link
Contributor

@william76 have you run a spot-check on this on White? Please post the results before we merge this in, thanks!

@william76 william76 self-assigned this Oct 3, 2018
@william76
Copy link
Contributor Author

@srajama1 I updated the contact information.

@william76
Copy link
Contributor Author

william76 commented Oct 3, 2018

@ndellingwood

I'm not sure if there's a specific tool for the spot check, but I ran the make test in my testing directory on white and got this:

[==========] Running 77 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 77 tests from cuda
[ RUN      ] cuda.abs_double
[       OK ] cuda.abs_double (32 ms)
[ RUN      ] cuda.abs_mv_double
[       OK ] cuda.abs_mv_double (34 ms)
[ RUN      ] cuda.asum_double
[       OK ] cuda.asum_double (25 ms)
[ RUN      ] cuda.axpby_double
[       OK ] cuda.axpby_double (28 ms)
[ RUN      ] cuda.axpby_mv_double
[       OK ] cuda.axpby_mv_double (34 ms)
[ RUN      ] cuda.axpy_double
[       OK ] cuda.axpy_double (28 ms)
[ RUN      ] cuda.axpy_mv_double
[       OK ] cuda.axpy_mv_double (33 ms)
[ RUN      ] cuda.dot_double
[       OK ] cuda.dot_double (27 ms)
[ RUN      ] cuda.dot_mv_double
[       OK ] cuda.dot_mv_double (32 ms)
[ RUN      ] cuda.mult_double
[       OK ] cuda.mult_double (31 ms)
[ RUN      ] cuda.mult_mv_double
[       OK ] cuda.mult_mv_double (37 ms)
[ RUN      ] cuda.nrm1_double
[       OK ] cuda.nrm1_double (24 ms)
[ RUN      ] cuda.nrm1_mv_double
[       OK ] cuda.nrm1_mv_double (27 ms)
[ RUN      ] cuda.nrm2_double
[       OK ] cuda.nrm2_double (24 ms)
[ RUN      ] cuda.nrm2_mv_double
[       OK ] cuda.nrm2_mv_double (28 ms)
[ RUN      ] cuda.nrm2_squared_double
[       OK ] cuda.nrm2_squared_double (24 ms)
[ RUN      ] cuda.nrm2_squared_mv_double
[       OK ] cuda.nrm2_squared_mv_double (28 ms)
[ RUN      ] cuda.nrminf_double
[       OK ] cuda.nrminf_double (24 ms)
[ RUN      ] cuda.nrminf_mv_double
[       OK ] cuda.nrminf_mv_double (28 ms)
[ RUN      ] cuda.reciprocal_double
[       OK ] cuda.reciprocal_double (29 ms)
[ RUN      ] cuda.reciprocal_mv_double
[       OK ] cuda.reciprocal_mv_double (36 ms)
[ RUN      ] cuda.scal_double
[       OK ] cuda.scal_double (28 ms)
[ RUN      ] cuda.scal_mv_double
[       OK ] cuda.scal_mv_double (39 ms)
[ RUN      ] cuda.sum_double
[       OK ] cuda.sum_double (25 ms)
[ RUN      ] cuda.sum_mv_double
[       OK ] cuda.sum_mv_double (28 ms)
[ RUN      ] cuda.update_double
[       OK ] cuda.update_double (31 ms)
[ RUN      ] cuda.update_mv_double
[       OK ] cuda.update_mv_double (38 ms)
[ RUN      ] cuda.gemv_double
[       OK ] cuda.gemv_double (1310 ms)
[ RUN      ] cuda.gemm_double
[       OK ] cuda.gemm_double (2480 ms)
[ RUN      ] cuda.sparse_spgemm_double_int_int_TestExecSpace
[       OK ] cuda.sparse_spgemm_double_int_int_TestExecSpace (2396 ms)
[ RUN      ] cuda.sparse_spadd_double_int_int_TestExecSpace
[       OK ] cuda.sparse_spadd_double_int_int_TestExecSpace (38 ms)
[ RUN      ] cuda.sparse_gauss_seidel_double_int_int_TestExecSpace
[       OK ] cuda.sparse_gauss_seidel_double_int_int_TestExecSpace (1064 ms)
[ RUN      ] cuda.sparse_block_gauss_seidel_double_int_int_TestExecSpace
[       OK ] cuda.sparse_block_gauss_seidel_double_int_int_TestExecSpace (2695 ms)
[ RUN      ] cuda.sparse_crsmatrix_double_int_int_TestExecSpace
[       OK ] cuda.sparse_crsmatrix_double_int_int_TestExecSpace (1 ms)
[ RUN      ] cuda.sparse_blkcrsmatrix_double_int_int_TestExecSpace
[       OK ] cuda.sparse_blkcrsmatrix_double_int_int_TestExecSpace (1 ms)
[ RUN      ] cuda.sparse_replaceSumIntoLonger_double_int_int_TestExecSpace
[       OK ] cuda.sparse_replaceSumIntoLonger_double_int_int_TestExecSpace (290 ms)
[ RUN      ] cuda.sparse_replaceSumInto_double_int_int_TestExecSpace
[       OK ] cuda.sparse_replaceSumInto_double_int_int_TestExecSpace (1 ms)
[ RUN      ] cuda.graph_graph_color_double_int_int_TestExecSpace
[       OK ] cuda.graph_graph_color_double_int_int_TestExecSpace (1094 ms)
[ RUN      ] cuda.graph_graph_color_deterministic_double_int_int_TestExecSpace
[       OK ] cuda.graph_graph_color_deterministic_double_int_int_TestExecSpace (6 ms)
[ RUN      ] cuda.graph_graph_color_d2_double_int_int_TestExecSpace
[       OK ] cuda.graph_graph_color_d2_double_int_int_TestExecSpace (1386 ms)
[ RUN      ] cuda.common_ArithTraits
[       OK ] cuda.common_ArithTraits (0 ms)
[ RUN      ] cuda.common_set_bit_count
[       OK ] cuda.common_set_bit_count (947 ms)
[ RUN      ] cuda.common_ffs
[       OK ] cuda.common_ffs (945 ms)
[ RUN      ] cuda.batched_scalar_serial_set_double_double
[       OK ] cuda.batched_scalar_serial_set_double_double (63 ms)
[ RUN      ] cuda.batched_scalar_serial_scale_double_double
[       OK ] cuda.batched_scalar_serial_scale_double_double (65 ms)
[ RUN      ] cuda.batched_scalar_serial_gemm_nt_nt_double_double
[       OK ] cuda.batched_scalar_serial_gemm_nt_nt_double_double (276 ms)
[ RUN      ] cuda.batched_scalar_serial_gemm_t_nt_double_double
[       OK ] cuda.batched_scalar_serial_gemm_t_nt_double_double (279 ms)
[ RUN      ] cuda.batched_scalar_serial_gemm_nt_t_double_double
[       OK ] cuda.batched_scalar_serial_gemm_nt_t_double_double (276 ms)
[ RUN      ] cuda.batched_scalar_serial_gemm_t_t_double_double
[       OK ] cuda.batched_scalar_serial_gemm_t_t_double_double (278 ms)
[ RUN      ] cuda.batched_scalar_serial_trsm_l_l_nt_u_double_double
[       OK ] cuda.batched_scalar_serial_trsm_l_l_nt_u_double_double (158 ms)
[ RUN      ] cuda.batched_scalar_serial_trsm_l_l_nt_n_double_double
[       OK ] cuda.batched_scalar_serial_trsm_l_l_nt_n_double_double (157 ms)
[ RUN      ] cuda.batched_scalar_serial_trsm_l_u_nt_u_double_double
[       OK ] cuda.batched_scalar_serial_trsm_l_u_nt_u_double_double (158 ms)
[ RUN      ] cuda.batched_scalar_serial_trsm_l_u_nt_n_double_double
[       OK ] cuda.batched_scalar_serial_trsm_l_u_nt_n_double_double (157 ms)
[ RUN      ] cuda.batched_scalar_serial_trsm_r_u_nt_u_double_double
[       OK ] cuda.batched_scalar_serial_trsm_r_u_nt_u_double_double (156 ms)
[ RUN      ] cuda.batched_scalar_serial_trsm_r_u_nt_n_double_double
[       OK ] cuda.batched_scalar_serial_trsm_r_u_nt_n_double_double (156 ms)
[ RUN      ] cuda.batched_scalar_serial_lu_double
[       OK ] cuda.batched_scalar_serial_lu_double (76 ms)
[ RUN      ] cuda.batched_scalar_serial_gemv_nt_double_double
[       OK ] cuda.batched_scalar_serial_gemv_nt_double_double (87 ms)
[ RUN      ] cuda.batched_scalar_serial_gemv_t_double_double
[       OK ] cuda.batched_scalar_serial_gemv_t_double_double (88 ms)
[ RUN      ] cuda.batched_scalar_serial_trsv_l_nt_u_double_double
[       OK ] cuda.batched_scalar_serial_trsv_l_nt_u_double_double (73 ms)
[ RUN      ] cuda.batched_scalar_serial_trsv_l_nt_n_double_double
[       OK ] cuda.batched_scalar_serial_trsv_l_nt_n_double_double (73 ms)
[ RUN      ] cuda.batched_scalar_serial_trsv_u_nt_u_double_double
[       OK ] cuda.batched_scalar_serial_trsv_u_nt_u_double_double (73 ms)
[ RUN      ] cuda.batched_scalar_serial_trsv_u_nt_n_double_double
[       OK ] cuda.batched_scalar_serial_trsv_u_nt_n_double_double (73 ms)
[ RUN      ] cuda.batched_scalar_team_set_double_double
[       OK ] cuda.batched_scalar_team_set_double_double (64 ms)
[ RUN      ] cuda.batched_scalar_team_scale_double_double
[       OK ] cuda.batched_scalar_team_scale_double_double (65 ms)
[ RUN      ] cuda.batched_scalar_team_gemm_nt_nt_double_double
[       OK ] cuda.batched_scalar_team_gemm_nt_nt_double_double (275 ms)
[ RUN      ] cuda.batched_scalar_team_gemm_t_nt_double_double
[       OK ] cuda.batched_scalar_team_gemm_t_nt_double_double (279 ms)
[ RUN      ] cuda.batched_scalar_team_gemm_nt_t_double_double
[       OK ] cuda.batched_scalar_team_gemm_nt_t_double_double (275 ms)
[ RUN      ] cuda.batched_scalar_team_gemm_t_t_double_double
[       OK ] cuda.batched_scalar_team_gemm_t_t_double_double (278 ms)
[ RUN      ] cuda.batched_scalar_team_trsm_l_l_nt_u_double_double
[       OK ] cuda.batched_scalar_team_trsm_l_l_nt_u_double_double (157 ms)
[ RUN      ] cuda.batched_scalar_team_trsm_l_l_nt_n_double_double
[       OK ] cuda.batched_scalar_team_trsm_l_l_nt_n_double_double (157 ms)
[ RUN      ] cuda.batched_scalar_team_trsm_l_u_nt_u_double_double
[       OK ] cuda.batched_scalar_team_trsm_l_u_nt_u_double_double (158 ms)
[ RUN      ] cuda.batched_scalar_team_trsm_l_u_nt_n_double_double
[       OK ] cuda.batched_scalar_team_trsm_l_u_nt_n_double_double (158 ms)
[ RUN      ] cuda.batched_scalar_team_trsm_r_u_nt_u_double_double
[       OK ] cuda.batched_scalar_team_trsm_r_u_nt_u_double_double (156 ms)
[ RUN      ] cuda.batched_scalar_team_trsm_r_u_nt_n_double_double
[       OK ] cuda.batched_scalar_team_trsm_r_u_nt_n_double_double (156 ms)
[ RUN      ] cuda.batched_scalar_team_lu_double
[       OK ] cuda.batched_scalar_team_lu_double (74 ms)
[ RUN      ] cuda.batched_scalar_team_gemv_nt_double_double
[       OK ] cuda.batched_scalar_team_gemv_nt_double_double (87 ms)
[ RUN      ] cuda.batched_scalar_team_gemv_t_double_double
[       OK ] cuda.batched_scalar_team_gemv_t_double_double (86 ms)
[----------] 77 tests from cuda (20609 ms total)

[----------] Global test environment tear-down
[==========] 77 tests from 1 test case ran. (20609 ms total)
[  PASSED  ] 77 tests.
./KokkosKernels_UnitTest_Serial
[==========] Running 112 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 112 tests from serial
[ RUN      ] serial.abs_double
[       OK ] serial.abs_double (8 ms)
[ RUN      ] serial.abs_mv_double
[       OK ] serial.abs_mv_double (21 ms)
[ RUN      ] serial.team_abs_double
[       OK ] serial.team_abs_double (7 ms)
[ RUN      ] serial.team_abs_mv_double
[       OK ] serial.team_abs_mv_double (19 ms)
[ RUN      ] serial.asum_double
[       OK ] serial.asum_double (4 ms)
[ RUN      ] serial.axpby_double
[       OK ] serial.axpby_double (6 ms)
[ RUN      ] serial.axpby_mv_double
[       OK ] serial.axpby_mv_double (21 ms)
[ RUN      ] serial.team_axpby_double
[       OK ] serial.team_axpby_double (6 ms)
[ RUN      ] serial.team_axpby_mv_double
[       OK ] serial.team_axpby_mv_double (24 ms)
[ RUN      ] serial.axpy_double
[       OK ] serial.axpy_double (6 ms)
[ RUN      ] serial.axpy_mv_double
[       OK ] serial.axpy_mv_double (21 ms)
[ RUN      ] serial.team_axpy_double
[       OK ] serial.team_axpy_double (6 ms)
[ RUN      ] serial.team_axpy_mv_double
[       OK ] serial.team_axpy_mv_double (23 ms)
[ RUN      ] serial.dot_double
[       OK ] serial.dot_double (6 ms)
[ RUN      ] serial.dot_mv_double
[       OK ] serial.dot_mv_double (21 ms)
[ RUN      ] serial.team_dot_double
[       OK ] serial.team_dot_double (6 ms)
[ RUN      ] serial.team_dot_mv_double
[       OK ] serial.team_dot_mv_double (28 ms)
[ RUN      ] serial.mult_double
[       OK ] serial.mult_double (10 ms)
[ RUN      ] serial.mult_mv_double
[       OK ] serial.mult_mv_double (24 ms)
[ RUN      ] serial.team_mult_double
[       OK ] serial.team_mult_double (10 ms)
[ RUN      ] serial.team_mult_mv_double
[       OK ] serial.team_mult_mv_double (27 ms)
[ RUN      ] serial.nrm1_double
[       OK ] serial.nrm1_double (3 ms)
[ RUN      ] serial.nrm1_mv_double
[       OK ] serial.nrm1_mv_double (11 ms)
[ RUN      ] serial.nrm2_double
[       OK ] serial.nrm2_double (4 ms)
[ RUN      ] serial.nrm2_mv_double
[       OK ] serial.nrm2_mv_double (11 ms)
[ RUN      ] serial.team_nrm2_double
[       OK ] serial.team_nrm2_double (11 ms)
[ RUN      ] serial.nrm2_squared_double
[       OK ] serial.nrm2_squared_double (3 ms)
[ RUN      ] serial.nrm2_squared_mv_double
[       OK ] serial.nrm2_squared_mv_double (11 ms)
[ RUN      ] serial.nrminf_double
[       OK ] serial.nrminf_double (3 ms)
[ RUN      ] serial.nrminf_mv_double
[       OK ] serial.nrminf_mv_double (9 ms)
[ RUN      ] serial.reciprocal_double
[       OK ] serial.reciprocal_double (8 ms)
[ RUN      ] serial.reciprocal_mv_double
[       OK ] serial.reciprocal_mv_double (29 ms)
[ RUN      ] serial.scal_double
[       OK ] serial.scal_double (9 ms)
[ RUN      ] serial.scal_mv_double
[       OK ] serial.scal_mv_double (23 ms)
[ RUN      ] serial.team_scal_double
[       OK ] serial.team_scal_double (7 ms)
[ RUN      ] serial.team_scal_mv_double
[       OK ] serial.team_scal_mv_double (28 ms)
[ RUN      ] serial.sum_double
[       OK ] serial.sum_double (3 ms)
[ RUN      ] serial.sum_mv_double
[       OK ] serial.sum_mv_double (12 ms)
[ RUN      ] serial.update_double
[       OK ] serial.update_double (10 ms)
[ RUN      ] serial.update_mv_double
[       OK ] serial.update_mv_double (29 ms)
[ RUN      ] serial.team_update_double
[       OK ] serial.team_update_double (10 ms)
[ RUN      ] serial.team_update_mv_double
[       OK ] serial.team_update_mv_double (33 ms)
[ RUN      ] serial.gemv_double
[       OK ] serial.gemv_double (4251 ms)
[ RUN      ] serial.team_gemv_double
[       OK ] serial.team_gemv_double (4264 ms)
[ RUN      ] serial.gemm_double
[       OK ] serial.gemm_double (193899 ms)
[ RUN      ] serial.sparse_spmv_double_int_int_TestExecSpace
[       OK ] serial.sparse_spmv_double_int_int_TestExecSpace (356 ms)
[ RUN      ] serial.sparse_spmv_mv_double_int_int_LayoutLeft_TestExecSpace
[       OK ] serial.sparse_spmv_mv_double_int_int_LayoutLeft_TestExecSpace (424 ms)
[ RUN      ] serial.sparse_spmv_mv_double_int_int_LayoutRight_TestExecSpace
[       OK ] serial.sparse_spmv_mv_double_int_int_LayoutRight_TestExecSpace (434 ms)
[ RUN      ] serial.sparse_trsv_mv_double_int_int_LayoutLeft_TestExecSpace
[       OK ] serial.sparse_trsv_mv_double_int_int_LayoutLeft_TestExecSpace (724 ms)
[ RUN      ] serial.sparse_trsv_mv_double_int_int_LayoutRight_TestExecSpace
[       OK ] serial.sparse_trsv_mv_double_int_int_LayoutRight_TestExecSpace (691 ms)
[ RUN      ] serial.sparse_spgemm_double_int_int_TestExecSpace
[       OK ] serial.sparse_spgemm_double_int_int_TestExecSpace (2539 ms)
[ RUN      ] serial.sparse_spadd_double_int_int_TestExecSpace
[       OK ] serial.sparse_spadd_double_int_int_TestExecSpace (4 ms)
[ RUN      ] serial.sparse_gauss_seidel_double_int_int_TestExecSpace
[       OK ] serial.sparse_gauss_seidel_double_int_int_TestExecSpace (1121 ms)
[ RUN      ] serial.sparse_block_gauss_seidel_double_int_int_TestExecSpace
[       OK ] serial.sparse_block_gauss_seidel_double_int_int_TestExecSpace (6977 ms)
[ RUN      ] serial.sparse_crsmatrix_double_int_int_TestExecSpace
[       OK ] serial.sparse_crsmatrix_double_int_int_TestExecSpace (0 ms)
[ RUN      ] serial.sparse_blkcrsmatrix_double_int_int_TestExecSpace
[       OK ] serial.sparse_blkcrsmatrix_double_int_int_TestExecSpace (0 ms)
[ RUN      ] serial.sparse_findRelOffset_double_int_int_TestExecSpace
[       OK ] serial.sparse_findRelOffset_double_int_int_TestExecSpace (0 ms)
[ RUN      ] serial.sparse_replaceSumIntoLonger_double_int_int_TestExecSpace
[       OK ] serial.sparse_replaceSumIntoLonger_double_int_int_TestExecSpace (46 ms)
[ RUN      ] serial.sparse_replaceSumInto_double_int_int_TestExecSpace
[       OK ] serial.sparse_replaceSumInto_double_int_int_TestExecSpace (0 ms)
[ RUN      ] serial.graph_graph_color_double_int_int_TestExecSpace
[       OK ] serial.graph_graph_color_double_int_int_TestExecSpace (2350 ms)
[ RUN      ] serial.graph_graph_color_deterministic_double_int_int_TestExecSpace
[       OK ] serial.graph_graph_color_deterministic_double_int_int_TestExecSpace (1 ms)
[ RUN      ] serial.graph_graph_color_d2_double_int_int_TestExecSpace
[       OK ] serial.graph_graph_color_d2_double_int_int_TestExecSpace (6249 ms)
[ RUN      ] serial.common_ArithTraits
[       OK ] serial.common_ArithTraits (0 ms)
[ RUN      ] serial.common_set_bit_count
[       OK ] serial.common_set_bit_count (1379 ms)
[ RUN      ] serial.common_ffs
[       OK ] serial.common_ffs (1078 ms)
[ RUN      ] serial.batched_scalar_serial_set_double_double
[       OK ] serial.batched_scalar_serial_set_double_double (49 ms)
[ RUN      ] serial.batched_scalar_serial_scale_double_double
[       OK ] serial.batched_scalar_serial_scale_double_double (51 ms)
[ RUN      ] serial.batched_scalar_serial_gemm_nt_nt_double_double
[       OK ] serial.batched_scalar_serial_gemm_nt_nt_double_double (227 ms)
[ RUN      ] serial.batched_scalar_serial_gemm_t_nt_double_double
[       OK ] serial.batched_scalar_serial_gemm_t_nt_double_double (222 ms)
[ RUN      ] serial.batched_scalar_serial_gemm_nt_t_double_double
[       OK ] serial.batched_scalar_serial_gemm_nt_t_double_double (215 ms)
[ RUN      ] serial.batched_scalar_serial_gemm_t_t_double_double
[       OK ] serial.batched_scalar_serial_gemm_t_t_double_double (215 ms)
[ RUN      ] serial.batched_scalar_serial_trsm_l_l_nt_u_double_double
[       OK ] serial.batched_scalar_serial_trsm_l_l_nt_u_double_double (17 ms)
[ RUN      ] serial.batched_scalar_serial_trsm_l_l_nt_n_double_double
[       OK ] serial.batched_scalar_serial_trsm_l_l_nt_n_double_double (20 ms)
[ RUN      ] serial.batched_scalar_serial_trsm_l_u_nt_u_double_double
[       OK ] serial.batched_scalar_serial_trsm_l_u_nt_u_double_double (16 ms)
[ RUN      ] serial.batched_scalar_serial_trsm_l_u_nt_n_double_double
[       OK ] serial.batched_scalar_serial_trsm_l_u_nt_n_double_double (19 ms)
[ RUN      ] serial.batched_scalar_serial_trsm_r_u_nt_u_double_double
[       OK ] serial.batched_scalar_serial_trsm_r_u_nt_u_double_double (17 ms)
[ RUN      ] serial.batched_scalar_serial_trsm_r_u_nt_n_double_double
[       OK ] serial.batched_scalar_serial_trsm_r_u_nt_n_double_double (18 ms)
[ RUN      ] serial.batched_scalar_serial_lu_double
[       OK ] serial.batched_scalar_serial_lu_double (12 ms)
[ RUN      ] serial.batched_scalar_serial_gemv_nt_double_double
[       OK ] serial.batched_scalar_serial_gemv_nt_double_double (7 ms)
[ RUN      ] serial.batched_scalar_serial_gemv_t_double_double
[       OK ] serial.batched_scalar_serial_gemv_t_double_double (7 ms)
[ RUN      ] serial.batched_scalar_serial_trsv_l_nt_u_double_double
[       OK ] serial.batched_scalar_serial_trsv_l_nt_u_double_double (1 ms)
[ RUN      ] serial.batched_scalar_serial_trsv_l_nt_n_double_double
[       OK ] serial.batched_scalar_serial_trsv_l_nt_n_double_double (2 ms)
[ RUN      ] serial.batched_scalar_serial_trsv_u_nt_u_double_double
[       OK ] serial.batched_scalar_serial_trsv_u_nt_u_double_double (1 ms)
[ RUN      ] serial.batched_scalar_serial_trsv_u_nt_n_double_double
[       OK ] serial.batched_scalar_serial_trsv_u_nt_n_double_double (1 ms)
[ RUN      ] serial.batched_scalar_team_set_double_double
[       OK ] serial.batched_scalar_team_set_double_double (48 ms)
[ RUN      ] serial.batched_scalar_team_scale_double_double
[       OK ] serial.batched_scalar_team_scale_double_double (51 ms)
[ RUN      ] serial.batched_scalar_team_gemm_nt_nt_double_double
[       OK ] serial.batched_scalar_team_gemm_nt_nt_double_double (226 ms)
[ RUN      ] serial.batched_scalar_team_gemm_t_nt_double_double
[       OK ] serial.batched_scalar_team_gemm_t_nt_double_double (223 ms)
[ RUN      ] serial.batched_scalar_team_gemm_nt_t_double_double
[       OK ] serial.batched_scalar_team_gemm_nt_t_double_double (214 ms)
[ RUN      ] serial.batched_scalar_team_gemm_t_t_double_double
[       OK ] serial.batched_scalar_team_gemm_t_t_double_double (215 ms)
[ RUN      ] serial.batched_scalar_team_trsm_l_l_nt_u_double_double
[       OK ] serial.batched_scalar_team_trsm_l_l_nt_u_double_double (17 ms)
[ RUN      ] serial.batched_scalar_team_trsm_l_l_nt_n_double_double
[       OK ] serial.batched_scalar_team_trsm_l_l_nt_n_double_double (25 ms)
[ RUN      ] serial.batched_scalar_team_trsm_l_u_nt_u_double_double
[       OK ] serial.batched_scalar_team_trsm_l_u_nt_u_double_double (21 ms)
[ RUN      ] serial.batched_scalar_team_trsm_l_u_nt_n_double_double
[       OK ] serial.batched_scalar_team_trsm_l_u_nt_n_double_double (26 ms)
[ RUN      ] serial.batched_scalar_team_trsm_r_u_nt_u_double_double
[       OK ] serial.batched_scalar_team_trsm_r_u_nt_u_double_double (22 ms)
[ RUN      ] serial.batched_scalar_team_trsm_r_u_nt_n_double_double
[       OK ] serial.batched_scalar_team_trsm_r_u_nt_n_double_double (30 ms)
[ RUN      ] serial.batched_scalar_team_lu_double
[       OK ] serial.batched_scalar_team_lu_double (16 ms)
[ RUN      ] serial.batched_scalar_team_gemv_nt_double_double
[       OK ] serial.batched_scalar_team_gemv_nt_double_double (8 ms)
[ RUN      ] serial.batched_scalar_team_gemv_t_double_double
[       OK ] serial.batched_scalar_team_gemv_t_double_double (8 ms)
[ RUN      ] serial.batched_vector_arithmatic_simd_double3
[       OK ] serial.batched_vector_arithmatic_simd_double3 (0 ms)
[ RUN      ] serial.batched_vector_arithmatic_simd_double4
[       OK ] serial.batched_vector_arithmatic_simd_double4 (1 ms)
[ RUN      ] serial.batched_vector_arithmatic_simd_double8
[       OK ] serial.batched_vector_arithmatic_simd_double8 (0 ms)
[ RUN      ] serial.batched_vector_math_simd_double3
[       OK ] serial.batched_vector_math_simd_double3 (0 ms)
[ RUN      ] serial.batched_vector_math_simd_double4
[       OK ] serial.batched_vector_math_simd_double4 (1 ms)
[ RUN      ] serial.batched_vector_relation_simd_double3
[       OK ] serial.batched_vector_relation_simd_double3 (0 ms)
[ RUN      ] serial.batched_vector_relation_simd_double4
[       OK ] serial.batched_vector_relation_simd_double4 (0 ms)
[ RUN      ] serial.batched_vector_logical_simd_double3
[       OK ] serial.batched_vector_logical_simd_double3 (0 ms)
[ RUN      ] serial.batched_vector_logical_simd_double4
[       OK ] serial.batched_vector_logical_simd_double4 (0 ms)
[ RUN      ] serial.batched_vector_misc_simd_double3
[       OK ] serial.batched_vector_misc_simd_double3 (0 ms)
[ RUN      ] serial.batched_vector_misc_simd_double4
[       OK ] serial.batched_vector_misc_simd_double4 (0 ms)
[ RUN      ] serial.batched_vector_view_simd_double4
[       OK ] serial.batched_vector_view_simd_double4 (43 ms)
[ RUN      ] serial.batched_vector_view_simd_double8
[       OK ] serial.batched_vector_view_simd_double8 (85 ms)
[----------] 112 tests from serial (229759 ms total)

[----------] Global test environment tear-down
[==========] 112 tests from 1 test case ran. (229759 ms total)
[  PASSED  ] 112 tests.

Is this what you're looking for?

@srajama1
Copy link
Contributor

srajama1 commented Oct 3, 2018

Not exactly. Please use --spot-check script we use that tests for all warnings etc. This is to avoid any nightly failures before the merge (sort of like a manual PR testing).

@william76
Copy link
Contributor Author

@srajama1 --spot-check fails, even for the current develop branch on white. @ndellingwood and I have been discussing this today and he got me set up with the test_all_sandia script for the spot-check test...

Here's how I'm it on White (per @ndellingwood instructions):

${KOKKOSKERNELS_DIR}/scripts/test_all_sandia \
    --spot-check \
    --kokkoskernels-path=${KOKKOSKERNELS_DIR} \
    --kokkos-path=${KOKKOS_DIR} \
    --skip-hwloc \
    cuda \
    gcc \
    --num=2

Here's the first set of error messages I see in the output for the CUDA-OpenMP build (note: this is for develop, not the branch on my fork, but it has the same errors). These errors seem to get repeated a lot through out the build before it finally ends.

/home/wcmclen/dev/kk-dev/source/KokkosKernels-Fork/src/blas/impl/KokkosBlas3_gemm_impl.hpp:456:21:   required from ‘void KokkosBlas::Impl::GEMMImpl<ExecSpace, ViewTypeA, ViewTypeB, ViewTypeC, blockA0, blockA1, blockB1, TransposeA, TransposeB>::run(int, int, int) [with
 ExecSpace = Kokkos::Cuda; ViewTypeA = Kokkos::View<const double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<1u> >; ViewTypeB = Kokkos::View<const double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaS
pace>, Kokkos::MemoryTraits<1u> >; ViewTypeC = Kokkos::View<double**, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<1u> >; int blockA0 = 24; int blockA1 = 16; int blockB1 = 64; int TransposeA = 0; int TransposeB = 0]’
/home/wcmclen/dev/kk-dev/source/KokkosKernels-Fork/src/blas/impl/KokkosBlas3_gemm_spec.hpp:167:1:   required from ‘static void KokkosBlas::Impl::GEMM<AViewType, BViewType, CViewType, tpl_spec_avail, eti_spec_avail>::gemm(const char*, const char*, typename AViewType::c
onst_value_type&, const AViewType&, const BViewType&, typename CViewType::const_value_type&, const CViewType&) [with AViewType = Kokkos::View<const double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<1u> >; BViewType = K
okkos::View<const double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<1u> >; CViewType = Kokkos::View<double**, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<1u> >; bool tpl_s
pec_avail = false; bool eti_spec_avail = true; typename AViewType::const_value_type = const double; typename CViewType::const_value_type = const double]’
/home/wcmclen/dev/kk-dev/source/KokkosKernels-Fork/src/impl/generated_specializations_cpp/gemm/KokkosBlas3_gemm_eti_spec_inst_double_LayoutRight_Cuda_CudaSpace.cpp:55:17:   required from here
/home/wcmclen/dev/kk-dev/kk-test/white-ride-Serial-Cuda/TestAll_2018-10-03_15.58.25/cuda/9.0.103/Cuda_OpenMP-release/kokkos/install/include/Cuda/Kokkos_Cuda_Internal.hpp:207:15: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
     if(numBlocks>=MinBlocksPerSM) return blockSize;
     ~~~~~~~~~~^~~~~~~~~~~~~~~~~
/home/wcmclen/dev/kk-dev/kk-test/white-ride-Serial-Cuda/TestAll_2018-10-03_15.58.25/cuda/9.0.103/Cuda_OpenMP-release/kokkos/install/include/Cuda/Kokkos_Cuda_Internal.hpp:209:39: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
     while (blockSize>32 && numBlocks<MinBlocksPerSM) {
                            ~~~~~~~~~~~^~~~~~~~~~~~~~~~
/home/wcmclen/dev/kk-dev/kk-test/white-ride-Serial-Cuda/TestAll_2018-10-03_15.58.25/cuda/9.0.103/Cuda_OpenMP-release/kokkos/install/include/Cuda/Kokkos_Cuda_Internal.hpp:220:44: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
     int blockSizeUpperBound = (blockSize*2<MaxThreadsPerBlock?blockSize*2:MaxThreadsPerBlock);
                           ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
/home/wcmclen/dev/kk-dev/kk-test/white-ride-Serial-Cuda/TestAll_2018-10-03_15.58.25/cuda/9.0.103/Cuda_OpenMP-release/kokkos/install/include/Cuda/Kokkos_Cuda_Internal.hpp:221:56: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
     while (blockSize<blockSizeUpperBound && numBlocks>=MinBlocksPerSM) {
                                             ~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/home/wcmclen/dev/kk-dev/kk-test/white-ride-Serial-Cuda/TestAll_2018-10-03_15.58.25/cuda/9.0.103/Cuda_OpenMP-release/kokkos/install/include/Cuda/Kokkos_Cuda_Internal.hpp:232:18: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
     if(oldNumBlocks>=MinBlocksPerSM) return blockSize - 32;
     ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~

@ndellingwood
Copy link
Contributor

@william76 Yeah, those error messages are due to -Werror flag that needs to be removed for Cuda testing.

@william76
Copy link
Contributor Author

@ndellingwood is there a way to remove that from the spot-check test? Will removing it affect nightly testing?

@ndellingwood
Copy link
Contributor

@william76 I just merged PR #300 which updated some modules in the test_all_sandia script and removed the -Werror flag for the Cuda builds. Sorry for the problems this caused. This won't affect the BLAS issues you saw unfortunately.

@william76
Copy link
Contributor Author

@ndellingwood Thanks! I just updated my develop branch and will run the spot-check on that, if it's good then I'll merge that into my HashmapAccumulator example branch and test.

@crtrott
Copy link
Member

crtrott commented Oct 3, 2018 via email

@william76
Copy link
Contributor Author

william76 commented Oct 4, 2018

@ndellingwood

Job <38032> is submitted to queue <rhel7G>.
<<Waiting for dispatch ...>>
<<Starting on white30>>
Running on machine: white
Going to test compilers:  cuda/9.2.88 gcc/7.2.0
Testing compiler cuda/9.2.88
  Starting job cuda-9.2.88-Cuda_OpenMP-release
  PASSED cuda-9.2.88-Cuda_OpenMP-release
Testing compiler gcc/7.2.0
  Starting job cuda-9.2.88-Cuda_Serial-release
  PASSED cuda-9.2.88-Cuda_Serial-release
  Starting job gcc-7.2.0-OpenMP-release
  PASSED gcc-7.2.0-OpenMP-release
  Starting job gcc-7.2.0-Serial-release
  PASSED gcc-7.2.0-Serial-release
  Starting job gcc-7.2.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-OpenMP_Serial-release

@william76 william76 removed their assignment Oct 4, 2018
@srajama1
Copy link
Contributor

srajama1 commented Oct 5, 2018

@ndellingwood : Any other concerns before merging this ?

@ndellingwood
Copy link
Contributor

Running an extra spot-check on kokkos-dev to hit more compilers since this touches the build system.

@ndellingwood
Copy link
Contributor

Rerunning the kokkos-dev check after fixing test_all_sandia. The tests I am running are on this PR with the develop branch merged in, and also manual removal of the coloring VBD and VBDBIT tests that are being addressed for Cuda in a separate PR.

@william76
Copy link
Contributor Author

@ndellingwood
I merged the updates from Develop into my fork to pick up the updates to the scripts.

Here's the spot-check output from my run on White:

--------------------------------------------------------------------------------
-
-           K O K K O S   K E R N E L S   S P O T   C H E C K
-
--------------------------------------------------------------------------------

Kokkos
- URL...: https://github.com/kokkos/kokkos.git
- Branch: develop
- SHA1..: 4ec058af153884b49304c111831a9d582c19c762

KokkosKernels
- URL...: https://github.com/william76/kokkos-kernels.git
- Branch: HashmapAccumulator-Test-v001
- SHA1..: e5d0dd25d8edd409b8a471309e4fd1e787badcb2

Spot Check Command
bsub -J wcmclen-KK-Test \
    -W 6:00 \
    -Is \
    -q rhel7F \
    /home/wcmclen/dev/kk-dev/source/KokkosKernels-Fork/scripts/test_all_sandia \
        --spot-check \
        --kokkoskernels-path=/home/wcmclen/dev/kk-dev/source/KokkosKernels-Fork \
        --kokkos-path=/home/wcmclen/dev/kk-dev/source/Kokkos \
        --arch="Power8,Kepler37" \
        --skip-hwloc \
        cuda \
        gcc \
        --num=2

***Forced exclusive execution
Job <38247> is submitted to queue <rhel7F>.
<<Waiting for dispatch ...>>
<<Starting on white23>>
Running on machine: white
Going to test compilers:  gcc/6.4.0 gcc/7.2.0 cuda/9.2.88
Testing compiler gcc/6.4.0
  Starting job gcc-6.4.0-Serial-release
  Starting job gcc-6.4.0-OpenMP-release
  PASSED gcc-6.4.0-OpenMP-release
Testing compiler gcc/7.2.0
  Starting job gcc-6.4.0-OpenMP_Serial-release
  PASSED gcc-6.4.0-Serial-release
  Starting job gcc-7.2.0-OpenMP-release
  PASSED gcc-7.2.0-OpenMP-release
  Starting job gcc-7.2.0-Serial-release
  PASSED gcc-6.4.0-OpenMP_Serial-release
Testing compiler cuda/9.2.88
  Starting job gcc-7.2.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-Serial-release
  PASSED gcc-7.2.0-OpenMP_Serial-release
  Starting job cuda-9.2.88-Cuda_OpenMP-release
  PASSED cuda-9.2.88-Cuda_OpenMP-release
  Starting job cuda-9.2.88-Cuda_Serial-release
  PASSED cuda-9.2.88-Cuda_Serial-release
#######################################################
PASSED TESTS
#######################################################
cuda-9.2.88-Cuda_OpenMP-release build_time=865 run_time=598
cuda-9.2.88-Cuda_Serial-release build_time=814 run_time=988
gcc-6.4.0-OpenMP-release build_time=279 run_time=228
gcc-6.4.0-OpenMP_Serial-release build_time=318 run_time=986
gcc-6.4.0-Serial-release build_time=248 run_time=663
gcc-7.2.0-OpenMP-release build_time=228 run_time=219
gcc-7.2.0-OpenMP_Serial-release build_time=367 run_time=859
gcc-7.2.0-Serial-release build_time=194 run_time=688
#######################################################
FAILED TESTS
#######################################################

--------------------------------------------------------------------------------
- Elapsed time: 6309 seconds.
--------------------------------------------------------------------------------

@ndellingwood
Copy link
Contributor

Thanks Will! Merging in.

@ndellingwood ndellingwood merged commit 63183b5 into kokkos:develop Oct 11, 2018
@william76 william76 deleted the HashmapAccumulator-Test-v001 branch October 12, 2018 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants