-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Gauss-Seidel and SGS #455
Conversation
Spot-check on kokkos-dev, Aug. 19: Failing tests are not related: |
Now the spot checks are clean: <<< kokkos-dev >>> |
@brian-kelley : Can you add Bowman spot-check and a wiki update of the new feature please. A benchmark page update will also be useful. Thanks for these ! This will be useful for our apps. @lucbv can we add a TODO to to evaluate this SGS as an option for the momentum solves ? |
@srajama1 Bowman spot checks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-kelley Sorry for my delay in reviewing this. Please see comments below.
namespace Impl{ | ||
|
||
template <typename HandleType, typename lno_row_view_t, typename lno_nnz_view_t> | ||
struct RCM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to expose RCM to the users rather than in Impl. Doesn't have to be part this PR though. We could add an issue.
//radix sort keys according to their corresponding values ascending. | ||
//keys are NOT preserved since the use of this in RCM doesn't care about degree after sorting | ||
template<typename size_type, typename KeyType, typename ValueType, typename IndexType, typename member_t> | ||
KOKKOS_INLINE_FUNCTION static void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this go away with PR #461 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srajama1 Yes it will.
} | ||
} | ||
|
||
//Functor that does breadth-first search on a sparse graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to expose BFS to the users as well. We could file an issue and come back later. No need to modify this PR.
One more comment : What is the default cluster size ? Does the user have to set it before calling ? |
Algorithm idea: -Use RCM to reduce matrix envelope -Build cluster graph using contiguous groups of rows in RCM order. -Dist-1 color the cluster graph. -Run Gauss-Seidel, running within each cluster color in parallel.
RCM-based cluster finding is very slow. Working on two different clustering algorithms that should be both faster (esp. on GPU) and produce higher quality clusters (sparser cluster graph).
Fast partitioning (clustering) works!
742aa7c
to
985e50d
Compare
If > 50 entries per row, use Cuthill-McKee clustering. Otherwise, use SSSP clustering.
Apparent bug in cluster color -> vertex color mapping, since bodyy5.mtx triggers a crash sometimes during create_reverse_map
Not using fixed iteration count; instead, run until variance of cluster size fails to improve
Both range and team policy versions.
Much more robust and also produces better quality for large clusters.
These are: balloon (default), RCM, and do-nothing Also, checks that the scaled residual is no higher than 1. The matrix is randomly generated to be diagonally dominant, so if the residual blows up it is a bug in Gauss-Seidel.
Needs cleanup + more testing before pushing
@srajama1 The cluster size needs to be set when the user creates the GS handle. There are now two overloads of create_gs_handle, this one for point coloring:
and this one for cluster:
No default is set for the cluster size. In practice it seems like anywhere between 8 and 64 is reasonable. That will go in the wiki entry. |
I'm running spot checks now. The numerical results are looking good. On af_shell7 (504k rows, 17.5M entries, and SPD) the preconditioned CG iteration counts were:
|
These iterations numbers look really good. Looking forward to getting this in the develop. |
The block PCG perf tests gets built in the Makefile-based build, so it's built by test_all_sandia.
I saw a comment about an error in email, but don't see it in the website. May be you have resolved it ? |
@srajama1 Yeah I deleted that comment, I forgot to checkout the right branch :) |
@srajama1 Actually, that error is still happening. On kokkos-dev, the
It think the OpenMP backend is getting initialized but not the Cuda... The output for KokkosKernels_UnitTest_OpenMP is exactly the same (it shouldn't ever be calling any CUDA runtime functions but it is calling cudaGetDeviceCount). I have the right modules loaded:
|
@srajama1 Nathan found out it is a system driver issue and it should be fixed pretty soon. |
I am glad you found the reason. Thanks @ndellingwood @brian-kelley |
@srajama1 CUDA on kokkos-dev is still broken, but below are successful test outputs from bowman and white. The only build that fails on kokkos-dev is GCC 5.3.0, CUDA 8.0, running on Kepler. Should I try to run test_all_sandia with a similar configuration on another machine, wait for kokkos-dev drivers to be fixed, or is bowman+white enough? ####################################################### ####################################################### |
I am ok with pushing this with testing on white. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-kelley : Thanks for the persistence on getting this in !
@srajama1 Cool, I'm ready to merge it then. |
Algorithm idea:
-Use RCM to reduce matrix envelope
-Build cluster graph using equal-size contiguous groups of rows in RCM order. The edges are the union of edges between vertices in different clusters.
-Color the cluster graph:
-Run Gauss-Seidel: within each cluster is serial, but clusters of a color are in parallel.
-In practice, this converges faster than traditional coloring GS, but preserves parallelism.
-In cfd1 from the SuiteSparse collection, traditional GS fails to converge entirely but this technique converges (slowly...)