Fast and Accurate Triangle Counting in Graph Streams Using Predictions (ICDM 2024)

Cristian Boldrin and Fabio Vandin, "Fast and Accurate Triangle Counting in Graph Streams Using Predictions", appeared at ICDM 2024.

Link for Arxiv Extended Version of the paper

Here are the instructions for running Tonic algorithm: Tiangles cOuntiNg with predICtions, for both insertion-only and fully-dynamic graph streams. Code is deployed in C++ 17 under gcc 9.4.0 compiler. Additionally, CMake 3.16+ is required.

We provide some examples of datasets and oracles inside the respective folders, so that you can skip directly to step (4) after having compiled the sources.

Compile the code

bash compile.sh

The binaries will be generated inside the build folder.
Preprocess the raw dataset

./build/DataPreprocessing <dataset_path> <delimiter> <skip> <output_path>

where dataset_path is the filepath to the dataset to be preprocessed, delimiter is the character used to separate the rows in the dataset, skip is the number of lines to skip before starting to read the dataset, and output_path is the path where the preprocessed dataset will be saved.
Build the Oracle

./build/BuildOracle <preprocessed_dataset_path> <type of the oracle = {Exact, noWR, Node}> <percentage_retain> <output_path> [wr_size]

where preprocessed_dataset_path is the path to the preprocessed dataset at point (2), type of the oracle is the type of oracle to be built (Exact, noWR, Node), percentage_retain is the fraction of top heavies edges/nodes to be retained in the oracle, output_path is the path where the oracle will be saved, and wr_size is the size of the waiting room for excluding the counts (only for the noWR oracle).
Run Tonic Algorithm:

./build/Tonic Tonic <flag: 0: insertion-only stream, 1: fully-dynamic stream> <random_seed> <memory_budget> <alpha> <beta> <dataset_path> <oracle_path> <oracle_type = [nodes, edges]> <output_path>

where flag is the type of the input stream (0 for insertion-only, 1 for fully-dynamic), random_seed is the seed for the random number generator, memory_budget is the memory budget for the algorithm, alpha and beta are the parameters for fraction of size of WR and H, preprocessed_dataset_path is the path to the preprocessed dataset at point (2), oracle_path is the path to the oracle at point (3), oracle_type is the type of oracle used (nodes or edges), and output_path is the path where the output will be saved.

Datasets

Here are the links to the datasets we used to perform the experiments.

Single Graphs:

Dataset	Nodes	Edges	Triangles
Edit EN Wikibooks	133k	386k	178k
SOC YouTube Growth	3.2M	9.3M	12.3M
Cit US Patents	3.7M	16.5M	7.5M
Actors Collaborations	382k	15M	346M
Stackoverflow	2.5M	28M	114M
SOC Live Journals	4.8M	42.8M	285.7M
Twitter-merged	41M	1.5B	34.6B

Graph Sequences:

This kind of datasets consists in a collection of consecutive graphs registered each in a given time of interval, meaning that the topology betweeen each graph is only slightly varying in terms of nodes and edges (and, consequently triangles).

Dataset	Number of Graphs	Description	Max Nodes	Max Edges	Max Triangles
Autonomous systems - Oregon-1	9	9 graphs of Autonomous Systems (AS) peering information inferred from Oregon route-views between March 31 2001 and May 26 2001	11k	23k	19k
CAIDA AS Relationships Datasets	122	CAIDA AS graphs, derived from a set of RouteViews BGP table snapshots, from January 2004 to November 2007	26k	106k	36k
Autonomous systems AS-733	733	733 Daily instances which span an interval of 785 days from November 8 1997 to January 2 2000, from the BGP logs	6k	13k	6k
Twitter snapshots	4	4 graphs of the Twitter following/followers network	29.9M	373M	4.4B

Experiments and Results

To reproduce experiments and results of the paper, please refer to the scripts folder.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
include		include
oracles		oracles
scripts		scripts
src		src
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
compile.sh		compile.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast and Accurate Triangle Counting in Graph Streams Using Predictions (ICDM 2024)

Datasets

Single Graphs:

Graph Sequences:

Experiments and Results

About

Releases

Packages

Contributors 2

Languages

License

VandinLab/Tonic

Folders and files

Latest commit

History

Repository files navigation

Fast and Accurate Triangle Counting in Graph Streams Using Predictions (ICDM 2024)

Datasets

Single Graphs:

Graph Sequences:

Experiments and Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages