-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST]: Leiden clustering before and after 23.02 #4529
Comments
23.02 is a fairly old version of cuGraph. I would start by moving to a newer version of cuGraph (the latest is 24.06) and seeing if that resolves your issues. Also @ChuckHastings I think we've had a similar issue reported in the past? What was the resolution there? |
Our original implementation of Leiden had a number of functionality issues (it was not much more than Louvain in actuality) and had significant scaling issues. Our implementation of Leiden was completely rewritten as of version 23.04, so you have probably observed that major shift. This version was much better, although there continued to be some minor issues with it through earlier this year. As of 24.06 we have no known outstanding issues with Leiden. So I would suggest using 24.06 or later. If you have examples where our Leiden implementation in 24.06 generates answers that you are concerned about, please provide some details (ideally a sample data set, parameters and what you might expect the result to be) and we will be happy to investigate. |
I am testing a few versions. I use the Apptainer by the way. 24.02 Leiden produced 10 clusters. 23.02 Leiden produced 19 clusters. In fact, I like the results from 23.02. Those results are comparable to what I could get from the leidenalg library which is s l o w. Thanks. |
Leiden circa 23.02 was really Louvain with just a tiny bit of extra logic. It didn't actually implement much of the Leiden algorithm at all. The big issue with 23.06 through 24.02 was inconsistencies. We identified and corrected several bugs that made things more consistent. Because we are operating in parallel, and Louvain/Leiden are greedy approximation algorithms, we can still be affected by numerical instability. Sums of values occur in parallel in a non-deterministic order and we can see minor variations in the results on larger graphs. We have seen several cases where the best choice in the greedy algorithm is really a tie, and due to the variability in the numerical stability we might see different vertices in the tie actually be a clear winner. This is a known source of variability in the results that we haven't tried to address. But we have addressed many of the inconsistencies. Producing 1.7 million clusters seems like a bug of some kind. Is the graph you are using something you can share? I can try to find comparable size graphs and see if I can get this type of behavior, but starting from your graph will be the easiest path. |
I can share my graph and code. Give me an hour or two. |
I put files here. Thank you for your help!! |
It may take some time to investigate, but we will keep you informed on progress. |
Just added this to our development plan for the 24.10 release. We should at least get some analysis done soon. |
If you don't mind, I can add another edge list, much smaller, showing similar results. Results using 24.06 were like Thanks. |
Smaller graphs are always easier to debug. Thanks! |
Just wanted to give you a quick update. I have been able to reproduce this locally. I ran your knn test data against our Leiden implementation and got a large number of clusters. I then ran the same data against a reference serial implementation and got a reasonable number of clusters. I have whittled your graph down to a size that I can actually debug against (a graph with about 2500 edges that still exhibits the same phenomena). |
Are there any news on this? I am currently running into the same problem. |
Finally tracked this down. There's a PR linked above that should correct this problem. The Leiden loop was terminating too early. It is only noticeable on larger graphs, and we weren't closely checking results on any large graphs in our testing. This should be corrected in the cugraph nightlies once the PR is merged and be part of the 24.12 release. |
Sounds Great!! Myles |
Awesome, thank you for solving this @ChuckHastings ! I guess the folks at RAPIDS singlecell should be made aware of this and the other problems with Leiden clustering since it is somewhat the default clustering approach in the field. |
Our implementation of Leiden was generating too many clusters. This was not obvious in smaller graphs, but as the graphs get larger the problem became more noticeable. The Leiden loop was terminating if the modularity stopped improving. But the Leiden algorithm as defined in the paper allows the refinement phase to reduce modularity in order to improve the quality of the clusters. The convergence criteria defined in the paper was based on making no changes on the iteration rather than strictly monitoring modularity change. Updating this criteria results in the Leiden algorithm running more iterations and converging on better answers. Closes #4529 Authors: - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Naim (https://github.com/naimnv) - Joseph Nke (https://github.com/jnke2016) - Seunghwa Kang (https://github.com/seunghwak) URL: #4730
What is your question?
Hi,
I use the Leiden clustering a lot. Louvain as well, sometimes. I am experiencing big differences in the results when I use 23.02 and newer versions. My dataset has 80 mil edges and 6 mil nodes. The biggest notable difference is older than 23.02, the Leiden clustering produces about 20 clusters, but newer than 23.02 produces, sometimes, 200K clusters for the same dataset and settings.
Now I need to use Dask cugraph but this big difference prevents me from moving on to the new versions.
Any idea?
Thank you.
Code of Conduct
The text was updated successfully, but these errors were encountered: