GitHub

#Parallel Correlation Clustering

This repository contains the code for the parallel algorithms presented in our NIPS 2015 paper for parallel correlation clustering. We take a database-transactional view of the serial KwikCluster algorithm which provides an expected objective value of 3 OPT. Each set of operations on a vertex is treated as a transaction to be executed in parallel with other transactions.

Our first algorithm, C4 (correlation clustering using concurrency control), resolves conflicts between transactions, but either forcing later transactions to wait, or correcting the clustering assignment of the later conflicting transaction. As a result, we guarantee serial equivalence with KwikCluster, and specifically maintains the 3 OPT approximation.

The second algorithm, ClusterWild!, ignores potential conflicts between transactions and simply executes them in parallel. Despite the conflicts, we are able to analytically show that ClusterWild! (BSP) obtains a reasonable error. In practice, the objective value of ClusterWild! (BSP) is almost the same as that of KwikCluster.

Our code implements the BSP and asynchronous versions of C4 and ClusterWild!, in addition to the serial KwikCluster algorithm and a distributed algorithm (CDK) described in this paper.

##Running examples without compilation As a demonstration, Example.scala runs the various correlation clustering algorithms. Data for the examples are provided in the data directory, sourced from Laboratory for Web Algorithms.

To run Example,

java -cp bin/parrCorrClus.jar Example inputfile=<filename> maxNThreads=<maximum number of threads>

For example,

java -cp bin/parrCorrClus.jar Examples inputfile=data/eswiki-2013_rand_symm maxNThreads=4

The default options runs serial KwikCluster (using 1 thread), C4 and ClusterWild! (BSP and asynchronous versions, using 1 to maxNThreads threads), and the CDK algorithm (using maxNThreads). For more options, see the documentation on Example.scala

Preprocessing graph data for examples

Graph data downloaded from Laboratory for Web Algorithms can be pre-processed (randomizing order and symmetrizing graph) for use by running Preprocess.scala with the command line argument

java -cp bin/parrCorrClus.jar PreprocessGraph inputfile=<filename>

For example,

java -cp bin/parrCorrClus.jar PreprocessGraph inputfile=data/eswiki-2013

##Running examples using SBT

If you want to modify the code and run the examples again you can either recompile using bin/update_bin or using the sbt launch scripts as below:

To run Example (this will recompile the code if necessary),

sbt/sbt "run-main Examples inputfile=<filename> maxNThreads=<maximum number of threads>"

For example,

sbt/sbt "run-main Examples inputfile=data/eswiki-2013_rand_symm maxNThreads=4"

For more options, see the documentation on Example.scala

Preprocessing graph data for examples

Graph data downloaded from Laboratory for Web Algorithms can be pre-processed (randomizing order and symmetrizing graph) for use by running Preprocess.scala with the command line argument

sbt/sbt "run-main PreprocessGraph inputfile=<filename>"

For example,

sbt/sbt "run-main PreprocessGraph inputfile=data/eswiki-2013"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
api_doc		api_doc
bin		bin
data		data
paper		paper
project		project
sbt		sbt
src/main/scala		src/main/scala
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocessing graph data for examples

Preprocessing graph data for examples

About

Releases

Packages

Languages

License

pxinghao/ParallelCorrelationClustering

Folders and files

Latest commit

History

Repository files navigation

Preprocessing graph data for examples

Preprocessing graph data for examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages