-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure Louvain to be more like other algorithms #2594
Restructure Louvain to be more like other algorithms #2594
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.10 #2594 +/- ##
===============================================
Coverage ? 61.11%
===============================================
Files ? 106
Lines ? 5634
Branches ? 0
===============================================
Hits ? 3443
Misses ? 2191
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -0,0 +1,370 @@ | |||
/* | |||
* Copyright (c) 2020-2022, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains implementation that was formerly in louvain.cuh
, so keeping the original copyright range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look much more consistent with the rest of the codebase.
I have few minor suggestions to improve code readability.
} | ||
}; | ||
|
||
template <typename graph_view_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use template <typename graph_view_t>
here and template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
in the graph_contraction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verbosity more than anything.
I tend to prefer template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
. But the edge_src_property_t was the graph_view_type. So that would require a couple of extra lines of parameter definition, using graph_view_t
is a bit clearer in that case.
I can switch to graph_view_t
for detail methods for consistency if that's what we think is better.
graph_view_t const& graph_view, | ||
typename graph_view_t::weight_type total_edge_weight, | ||
typename graph_view_t::weight_type resolution, | ||
rmm::device_uvector<typename graph_view_t::weight_type>& vertex_weights_v, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be const
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this one could be const.
typename graph_view_t::weight_type resolution, | ||
rmm::device_uvector<typename graph_view_t::weight_type>& vertex_weights_v, | ||
rmm::device_uvector<typename graph_view_t::vertex_type>& cluster_keys_v, | ||
rmm::device_uvector<typename graph_view_t::weight_type>& cluster_weights_v, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this function updates clustering by delta modularity AND based on the updated clustering, updates per cluster weight sums as well. May better carve out the per cluster weight sum update part from this function?
If we remove
std::tie(cluster_keys_v, cluster_weights_v) = cugraph::transform_reduce_e_by_src_key(
handle,
graph_view,
edge_src_dummy_property_t{}.view(),
edge_dst_dummy_property_t{}.view(),
graph_view_t::is_multi_gpu
? src_clusters_cache.view()
: detail::edge_major_property_view_t<vertex_t, vertex_t const*>(next_clusters_v.data()),
detail::return_edge_weight_t<vertex_t, weight_t>{},
weight_t{0});
cluster_keys_v
and cluster_weights_v
can be const as well, and it might be much easier to understand the input and output.
And if we further carve out the cache update parts, this function can take next_clusters_v
as an R-value, and return the updated next_clusters_v
. Then, all the input parameters will become const reference or scalars. This might be more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @seunghwak. It would make it easier to follow then. Also in that case it would be nicer to change the function name as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I can refactor that as well. That's probably an artifact of the implementation prior to using the primitives. IIRC, I had an optimization in the original SG implementation that allowed me to skip a few of these steps if I did it all together. Clearly now they can be separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored in next push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look to me and I will pull this branch once merged and base my work on top of it.
::testing::Values(cugraph::test::File_Usecase( | ||
"test/datasets/karate.mtx") //, | ||
// cugraph::test::File_Usecase("test/datasets/dolphins.mtx") | ||
))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason for not testing with dolphin.mtx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. I had commented it out to do faster debugging on something that only failed in karate. I'll add that back in.
return transform_reduce_e( | ||
handle, | ||
*this, | ||
edge_src_dummy_property_t{}.view(), | ||
edge_dst_dummy_property_t{}.view(), | ||
[] __device__(auto, auto, weight_t wt, auto, auto) { return wt; }, | ||
weight_t{0}); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use common functor for both mg sg compute_total_edge_weight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try making a shared detail method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next push
vertex_cluster_weights_v.resize(0, handle.get_stream()); | ||
vertex_cluster_weights_v.shrink_to_fit(handle.get_stream()); | ||
} else { | ||
thrust::sort_by_key(handle.get_thrust_policy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here ? Something like -
// sort cluster_keys_v_ and cluster_weights_v_ to use them for binary search (lower_bound)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one thing we may consider is whether should we better sort here or in compute_cluster_keys_and_values
. I was at the beginning a bit confused why cluster_keys_v
and cluster_weights_v
are R-values.
cluster_weights_v.begin()); | ||
|
||
vertex_cluster_weights_v.resize(next_clusters_v.size(), handle.get_stream()); | ||
thrust::transform(handle.get_thrust_policy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here ? Something like -
// for each cluster found in next_clusters_v_, lookup its weight in cluster_weights_v_
// and store them in local variable vertex_cluster_weights_v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next push
typename graph_view_t::weight_type resolution, | ||
rmm::device_uvector<typename graph_view_t::weight_type>& vertex_weights_v, | ||
rmm::device_uvector<typename graph_view_t::vertex_type>& cluster_keys_v, | ||
rmm::device_uvector<typename graph_view_t::weight_type>& cluster_weights_v, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @seunghwak. It would make it easier to follow then. Also in that case it would be nicer to change the function name as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gpucibot merge |
Eliminate the Louvain class and restructure the louvain algorithm to be like the other algorithms.
This will involve either inlining functionality from the Louvain member functions, or creating stand-alone detail methods that encapsulate the logic.