[QST]: Leiden clustering before and after 23.02 #4529

wdnlotm · 2024-07-09T19:52:46Z

What is your question?

Hi,
I use the Leiden clustering a lot. Louvain as well, sometimes. I am experiencing big differences in the results when I use 23.02 and newer versions. My dataset has 80 mil edges and 6 mil nodes. The biggest notable difference is older than 23.02, the Leiden clustering produces about 20 clusters, but newer than 23.02 produces, sometimes, 200K clusters for the same dataset and settings.
Now I need to use Dask cugraph but this big difference prevents me from moving on to the new versions.
Any idea?
Thank you.

Code of Conduct

I agree to follow cuGraph's Code of Conduct
I have searched the open issues and have found no duplicates for this question

alexbarghi-nv · 2024-07-09T20:53:19Z

23.02 is a fairly old version of cuGraph. I would start by moving to a newer version of cuGraph (the latest is 24.06) and seeing if that resolves your issues.

Also @ChuckHastings I think we've had a similar issue reported in the past? What was the resolution there?

ChuckHastings · 2024-07-10T00:16:10Z

Our original implementation of Leiden had a number of functionality issues (it was not much more than Louvain in actuality) and had significant scaling issues. Our implementation of Leiden was completely rewritten as of version 23.04, so you have probably observed that major shift. This version was much better, although there continued to be some minor issues with it through earlier this year. As of 24.06 we have no known outstanding issues with Leiden. So I would suggest using 24.06 or later.

If you have examples where our Leiden implementation in 24.06 generates answers that you are concerned about, please provide some details (ideally a sample data set, parameters and what you might expect the result to be) and we will be happy to investigate.

wdnlotm · 2024-07-10T01:15:53Z

I am testing a few versions.
I have an edge list from a knn (82 mil edges, 5.5 mil vertices).
I load it to G=cugraph.MultiGraph(directed=False). I use leiden or louvain by running
parts, modularity_score = cugraph.louvain(G, max_level=500, resolution = 0.65) or
parts, modularity_score = cugraph.leiden(G, max_iter=500, resolution = 0.65, random_state=123)

I use the Apptainer by the way.
24.06 Leiden produced 1741523 clusters
24.06 Louvain produced 11 (eleven) clusters

24.02 Leiden produced 10 clusters.
Great!! But if I rerun the same code, I got 11 clusters. It's good too but not consistent. After one more run, I got 141 clusters.
24.02 Louvain produced 11 (eleven) clusters

23.02 Leiden produced 19 clusters.
23.02 Louvain produced 20 clusters

In fact, I like the results from 23.02. Those results are comparable to what I could get from the leidenalg library which is s l o w.

Thanks.

ChuckHastings · 2024-07-10T16:09:28Z

Leiden circa 23.02 was really Louvain with just a tiny bit of extra logic. It didn't actually implement much of the Leiden algorithm at all.

The big issue with 23.06 through 24.02 was inconsistencies. We identified and corrected several bugs that made things more consistent. Because we are operating in parallel, and Louvain/Leiden are greedy approximation algorithms, we can still be affected by numerical instability. Sums of values occur in parallel in a non-deterministic order and we can see minor variations in the results on larger graphs. We have seen several cases where the best choice in the greedy algorithm is really a tie, and due to the variability in the numerical stability we might see different vertices in the tie actually be a clear winner. This is a known source of variability in the results that we haven't tried to address. But we have addressed many of the inconsistencies.

Producing 1.7 million clusters seems like a bug of some kind. Is the graph you are using something you can share? I can try to find comparable size graphs and see if I can get this type of behavior, but starting from your graph will be the easiest path.

wdnlotm · 2024-07-10T16:20:59Z

I can share my graph and code. Give me an hour or two.
Thanks.

wdnlotm · 2024-07-10T16:42:31Z

I put files here. Thank you for your help!!
https://drive.google.com/drive/folders/10lziXlXut-ZGcXDlwbk6SxqPB9OIunSS?usp=sharing

ChuckHastings · 2024-07-10T17:18:33Z

It may take some time to investigate, but we will keep you informed on progress.

ChuckHastings · 2024-08-06T00:37:23Z

Just added this to our development plan for the 24.10 release. We should at least get some analysis done soon.

wdnlotm · 2024-08-19T23:53:11Z

If you don't mind, I can add another edge list, much smaller, showing similar results.
https://drive.google.com/drive/folders/10lziXlXut-ZGcXDlwbk6SxqPB9OIunSS?usp=sharing
go into smaller_knn

Results using 24.06 were like
with Leiden 35387 clusters
with Louvain 15 clusters.

Thanks.

ChuckHastings · 2024-08-20T14:33:07Z

Smaller graphs are always easier to debug. Thanks!

ChuckHastings · 2024-09-24T21:05:46Z

Just wanted to give you a quick update. I have been able to reproduce this locally. I ran your knn test data against our Leiden implementation and got a large number of clusters. I then ran the same data against a reference serial implementation and got a reasonable number of clusters. I have whittled your graph down to a size that I can actually debug against (a graph with about 2500 edges that still exhibits the same phenomena).

mbruhns · 2024-10-17T13:13:11Z

Are there any news on this? I am currently running into the same problem.

ChuckHastings · 2024-10-18T18:40:13Z

Finally tracked this down. There's a PR linked above that should correct this problem. The Leiden loop was terminating too early. It is only noticeable on larger graphs, and we weren't closely checking results on any large graphs in our testing.

This should be corrected in the cugraph nightlies once the PR is merged and be part of the 24.12 release.

wdnlotm · 2024-10-18T19:20:46Z

Sounds Great!!
Thank you!

Myles

mbruhns · 2024-10-18T19:57:21Z

Awesome, thank you for solving this @ChuckHastings ! I guess the folks at RAPIDS singlecell should be made aware of this and the other problems with Leiden clustering since it is somewhat the default clustering approach in the field.

Our implementation of Leiden was generating too many clusters. This was not obvious in smaller graphs, but as the graphs get larger the problem became more noticeable. The Leiden loop was terminating if the modularity stopped improving. But the Leiden algorithm as defined in the paper allows the refinement phase to reduce modularity in order to improve the quality of the clusters. The convergence criteria defined in the paper was based on making no changes on the iteration rather than strictly monitoring modularity change. Updating this criteria results in the Leiden algorithm running more iterations and converging on better answers. Closes #4529 Authors: - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Naim (https://github.com/naimnv) - Joseph Nke (https://github.com/jnke2016) - Seunghwa Kang (https://github.com/seunghwak) URL: #4730

wdnlotm added the question Further information is requested label Jul 9, 2024

ChuckHastings self-assigned this Jul 10, 2024

ChuckHastings mentioned this issue Oct 18, 2024

Address Leiden clustering generating too many clusters #4730

Merged

rapids-bot bot closed this as completed in #4730 Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST]: Leiden clustering before and after 23.02 #4529

[QST]: Leiden clustering before and after 23.02 #4529

wdnlotm commented Jul 9, 2024

alexbarghi-nv commented Jul 9, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

ChuckHastings commented Aug 6, 2024

wdnlotm commented Aug 19, 2024

ChuckHastings commented Aug 20, 2024

ChuckHastings commented Sep 24, 2024

mbruhns commented Oct 17, 2024

ChuckHastings commented Oct 18, 2024

wdnlotm commented Oct 18, 2024

mbruhns commented Oct 18, 2024

[QST]: Leiden clustering before and after 23.02 #4529

[QST]: Leiden clustering before and after 23.02 #4529

Comments

wdnlotm commented Jul 9, 2024

What is your question?

Code of Conduct

alexbarghi-nv commented Jul 9, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

ChuckHastings commented Aug 6, 2024

wdnlotm commented Aug 19, 2024

ChuckHastings commented Aug 20, 2024

ChuckHastings commented Sep 24, 2024

mbruhns commented Oct 17, 2024

ChuckHastings commented Oct 18, 2024

wdnlotm commented Oct 18, 2024

mbruhns commented Oct 18, 2024