#Parallel Correlation Clustering
This repository contains the code for the parallel algorithms presented in our NIPS 2015 paper for parallel correlation clustering. We take a database-transactional view of the serial KwikCluster algorithm which provides an expected objective value of 3 OPT. Each set of operations on a vertex is treated as a transaction to be executed in parallel with other transactions.
Our first algorithm, C4 (correlation clustering using concurrency control), resolves conflicts between transactions, but either forcing later transactions to wait, or correcting the clustering assignment of the later conflicting transaction. As a result, we guarantee serial equivalence with KwikCluster, and specifically maintains the 3 OPT approximation.
The second algorithm, ClusterWild!, ignores potential conflicts between transactions and simply executes them in parallel. Despite the conflicts, we are able to analytically show that ClusterWild! (BSP) obtains a reasonable error. In practice, the objective value of ClusterWild! (BSP) is almost the same as that of KwikCluster.
Our code implements the BSP and asynchronous versions of C4 and ClusterWild!, in addition to the serial KwikCluster algorithm and a distributed algorithm (CDK) described in this paper.
##Running examples without compilation As a demonstration, Example.scala runs the various correlation clustering algorithms. Data for the examples are provided in the data directory, sourced from Laboratory for Web Algorithms.
To run Example,
java -cp bin/parrCorrClus.jar Example inputfile=<filename> maxNThreads=<maximum number of threads>
For example,
java -cp bin/parrCorrClus.jar Examples inputfile=data/eswiki-2013_rand_symm maxNThreads=4
The default options runs serial KwikCluster (using 1 thread), C4 and ClusterWild! (BSP and asynchronous versions, using 1 to maxNThreads threads), and the CDK algorithm (using maxNThreads). For more options, see the documentation on Example.scala
Graph data downloaded from Laboratory for Web Algorithms can be pre-processed (randomizing order and symmetrizing graph) for use by running Preprocess.scala with the command line argumentjava -cp bin/parrCorrClus.jar PreprocessGraph inputfile=<filename>For example,
java -cp bin/parrCorrClus.jar PreprocessGraph inputfile=data/eswiki-2013
##Running examples using SBT
If you want to modify the code and run the examples again you can either recompile using bin/update_bin or using the sbt launch scripts as below:
To run Example (this will recompile the code if necessary),
sbt/sbt "run-main Examples inputfile=<filename> maxNThreads=<maximum number of threads>"
For example,
sbt/sbt "run-main Examples inputfile=data/eswiki-2013_rand_symm maxNThreads=4"
For more options, see the documentation on Example.scala
Graph data downloaded from Laboratory for Web Algorithms can be pre-processed (randomizing order and symmetrizing graph) for use by running Preprocess.scala with the command line argumentsbt/sbt "run-main PreprocessGraph inputfile=<filename>"For example,
sbt/sbt "run-main PreprocessGraph inputfile=data/eswiki-2013"