Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix624: lots of distance-2 changes, fix Utils transpose #661

Merged
merged 6 commits into from
Mar 17, 2020

Conversation

brian-kelley
Copy link
Contributor

@brian-kelley brian-kelley commented Mar 16, 2020

Fix #624.

Refactor the D2 interface: deprecate old function graph_compute_distance2_color(), and add 3 new functions graph_color_distance2(), bipartite_color_rows(), and bipartite_color_columns(). The first new function does distance-2 coloring on an undirected/symmetric graph (so no d-1 or d-2 conflicts). The other two do one-sided coloring of a bipartite graph, so d-1 conflicts are not considered, and the input graph doesn't have to be symmetric (but it can be). If it isn't symmetric, the transpose is computed internally so the user doesn't have to.

All the new functions and the old one (if deprecated enabled) are tested.

Delete the matrix-squared version of D2 (it was temporary anyway).

Fix a bug in kk_generate_sparse_matrix where a row could have a negative number of entries, e.g. rowmap(k+1) < rowmap(k).

@brian-kelley
Copy link
Contributor Author

Testing:
kokkos-dev2:
#######################################################
PASSED TESTS
#######################################################
clang-8.0-Pthread_Serial-release build_time=87 run_time=368
cuda-10.1-Cuda_OpenMP-release build_time=274 run_time=305
cuda-9.2-Cuda_Serial-release build_time=263 run_time=462
gcc-7.3.0-OpenMP-release build_time=58 run_time=120
gcc-7.3.0-Pthread-release build_time=58 run_time=189
gcc-8.3.0-Serial-release build_time=60 run_time=204
gcc-9.1-OpenMP-release build_time=80 run_time=285
gcc-9.1-Serial-release build_time=74 run_time=206
intel-18.0.5-OpenMP-release build_time=172 run_time=150
#######################################################
FAILED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release (test failed)
#######################################################
(the only failure is due to #645)

RIDE:
#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=530 run_time=404
cuda-10.1.105-Cuda_Serial-release build_time=558 run_time=513
cuda-9.2.88-Cuda_OpenMP-release build_time=542 run_time=552
cuda-9.2.88-Cuda_Serial-release build_time=546 run_time=665
gcc-6.4.0-OpenMP_Serial-release build_time=189 run_time=388
gcc-7.2.0-OpenMP-release build_time=132 run_time=129
gcc-7.2.0-OpenMP_Serial-release build_time=211 run_time=359
gcc-7.2.0-Serial-release build_time=115 run_time=226
ibm-16.1.0-Serial-release build_time=509 run_time=396

@brian-kelley
Copy link
Contributor Author

Also, I ran kokkos-dev2 spot check with -DKokkos_ENABLE_DEPRECATED_CODE=ON. Most builds tested cleanly (including the deprecated D2 test), although a couple of builds failed, e.g. an error in Kokkos with GCC+pthreads. None of these errors were related to this PR, though.

Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the transpose did not have an issue after all we might not need to change the code there?
On the other hand adding a test and maybe a simpler interface that takes in a Kokkos::CrsMatrix could make life easier, I would also argue that separating the transpose capability into its own header file would make it more obvious (it took me a bit of time to find it).

perf_test/graph/KokkosGraph_color_d2.cpp Outdated Show resolved Hide resolved
perf_test/graph/KokkosGraph_color_d2.cpp Outdated Show resolved Hide resolved
Now distinguishing between D-2 (which doesn't allow dist-1 conflicts
either) and one-sided bipartite coloring (which DOES allow dist-1
conflicts, because rows and columns are considered distinct sets of vertices).
@brian-kelley
Copy link
Contributor Author

@lucbv I did the things you suggested and the transpose changes are completely reverted. Re-running tests.

Copy link
Contributor

@srajama1 srajama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @brian-kelley ! See comments below

perf_test/graph/KokkosGraph_color_d2.cpp Outdated Show resolved Hide resolved
src/common/KokkosKernels_SparseUtils.hpp Outdated Show resolved Hide resolved
src/common/KokkosKernels_SparseUtils.hpp Outdated Show resolved Hide resolved
src/common/KokkosKernels_SparseUtils.hpp Outdated Show resolved Hide resolved
src/graph/KokkosGraph_Distance2Color.hpp Outdated Show resolved Hide resolved
src/graph/KokkosGraph_Distance2Color.hpp Show resolved Hide resolved
@srajama1
Copy link
Contributor

Man .. GitHub ate some of my comments .. will do them again ..

Copy link
Contributor

@srajama1 srajama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments now ...

WIP: debugging the transpose issue. Using the old transpose,
bipartite column test gets a wrong rowmap
@brian-kelley
Copy link
Contributor Author

Wow, just figured out the bug in kk_transpose_graph():

 449 inline void kk_transpose_graph(
 450     typename in_nnz_view_t::non_const_value_type num_rows,
 451     typename in_nnz_view_t::non_const_value_type num_cols,
 452     in_row_view_t xadj,
 453     in_nnz_view_t adj,
 454     out_row_view_t t_xadj, //pre-allocated -- initialized with 0
 455     out_nnz_view_t t_adj,  //pre-allocated -- no need for initialize
 456     int vector_size = -1,
 457     int suggested_team_size = -1,
 458     typename in_nnz_view_t::non_const_value_type team_work_chunk_size = -1,
 459     bool use_dynamic_scheduling = true
 460     ){
 461   std::cout << "Hello from kk_transpose_graph().\n";
 462
 463   //allocate some memory for work for row pointers
 464   tempwork_row_view_t tmp_row_view(Kokkos::ViewAllocateWithoutInitializing("tmp_row_view"), num_co     ls + 1);
 465
 466   in_nnz_view_t tmp1;
 467   out_nnz_view_t tmp2;
 468
 469   //create the functor for tranpose.
 470   typedef TransposeMatrix <
 471       in_row_view_t, in_nnz_view_t, in_nnz_view_t,
 472       out_row_view_t, out_nnz_view_t, out_nnz_view_t,
 473       tempwork_row_view_t, MyExecSpace>  TransposeFunctor_t;
 474
 475   TransposeFunctor_t tm ( num_rows, num_cols, xadj, adj, tmp1,
 476                           t_xadj, t_adj, tmp2,
 477                           tmp_row_view,
 478                           false,
 479                           team_work_chunk_size);

The functor gets constructed with team_work_chunk_size=-1 by default, before team_work_chunk_size gets set to the team size if it's -1. And a grep says that I'm the first person to actually use the graph transpose (matrix didn't have this problem).

by not constructing the functor with -1 as the chunk size by default.
Fixed by moving functor construction after the logic to choose the
default team/vector/chunk sizes.
Also cleaned up debugging code, now that transpose is fixed
@srajama1
Copy link
Contributor

@brian-kelley : Can you add performance numbers for reproducibility and documentation ?

We spoke about few other things, you can take care of it and resolve the conversations

Looks like this is ready to go ..

@brian-kelley
Copy link
Contributor Author

Re-running the tests. I added the symmetric bipartite test, but I will put the transpose unit test in a different PR.

@brian-kelley
Copy link
Contributor Author

Testing is good:
@srajama1 or @lucbv can you give final approval?
#######################################################
PASSED TESTS
#######################################################
clang-8.0-Pthread_Serial-release build_time=86 run_time=371
cuda-10.1-Cuda_OpenMP-release build_time=272 run_time=323
cuda-9.2-Cuda_Serial-release build_time=262 run_time=431
gcc-7.3.0-OpenMP-release build_time=58 run_time=119
gcc-7.3.0-Pthread-release build_time=57 run_time=184
gcc-8.3.0-Serial-release build_time=60 run_time=176
gcc-9.1-OpenMP-release build_time=71 run_time=116
gcc-9.1-Serial-release build_time=66 run_time=173
intel-18.0.5-OpenMP-release build_time=164 run_time=121
#######################################################
FAILED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release (test failed) (because of #645)
#######################################################

#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=519 run_time=408
cuda-10.1.105-Cuda_Serial-release build_time=563 run_time=515
cuda-9.2.88-Cuda_OpenMP-release build_time=553 run_time=535
cuda-9.2.88-Cuda_Serial-release build_time=518 run_time=660
gcc-6.4.0-OpenMP_Serial-release build_time=206 run_time=388
gcc-7.2.0-OpenMP-release build_time=128 run_time=130
gcc-7.2.0-OpenMP_Serial-release build_time=199 run_time=361
gcc-7.2.0-Serial-release build_time=116 run_time=225
ibm-16.1.0-Serial-release build_time=491 run_time=395

@lucbv lucbv self-requested a review March 17, 2020 19:30
Copy link
Contributor

@srajama1 srajama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @brian-kelley !

@brian-kelley brian-kelley merged commit b009b6c into kokkos:develop Mar 17, 2020
Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants