Skip to content

Fast and Accurate Triangle Counting in Graph Streams Using Predictions (ICDM 2024)

License

Notifications You must be signed in to change notification settings

VandinLab/Tonic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast and Accurate Triangle Counting in Graph Streams Using Predictions (ICDM 2024)

Cristian Boldrin and Fabio Vandin, "Fast and Accurate Triangle Counting in Graph Streams Using Predictions", appeared at ICDM 2024.

Link for Arxiv Extended Version of the paper


Here are the instructions for running Tonic algorithm: Tiangles cOuntiNg with predICtions, for both insertion-only and fully-dynamic graph streams. Code is deployed in C++ 17 under gcc 9.4.0 compiler. Additionally, CMake 3.16+ is required.

We provide some examples of datasets and oracles inside the respective folders, so that you can skip directly to step (4) after having compiled the sources.

  1. Compile the code

    bash compile.sh

    The binaries will be generated inside the build folder.

  2. Preprocess the raw dataset

    ./build/DataPreprocessing <dataset_path> <delimiter> <skip> <output_path>

    where dataset_path is the filepath to the dataset to be preprocessed, delimiter is the character used to separate the rows in the dataset, skip is the number of lines to skip before starting to read the dataset, and output_path is the path where the preprocessed dataset will be saved.

  3. Build the Oracle

    ./build/BuildOracle <preprocessed_dataset_path> <type of the oracle = {Exact, noWR, Node}> <percentage_retain> <output_path> [wr_size]

    where preprocessed_dataset_path is the path to the preprocessed dataset at point (2), type of the oracle is the type of oracle to be built (Exact, noWR, Node), percentage_retain is the fraction of top heavies edges/nodes to be retained in the oracle, output_path is the path where the oracle will be saved, and wr_size is the size of the waiting room for excluding the counts (only for the noWR oracle).

  4. Run Tonic Algorithm:

    ./build/Tonic Tonic <flag: 0: insertion-only stream, 1: fully-dynamic stream> <random_seed> <memory_budget> <alpha> <beta> <dataset_path> <oracle_path> <oracle_type = [nodes, edges]> <output_path>

    where flag is the type of the input stream (0 for insertion-only, 1 for fully-dynamic), random_seed is the seed for the random number generator, memory_budget is the memory budget for the algorithm, alpha and beta are the parameters for fraction of size of WR and H, preprocessed_dataset_path is the path to the preprocessed dataset at point (2), oracle_path is the path to the oracle at point (3), oracle_type is the type of oracle used (nodes or edges), and output_path is the path where the output will be saved.

Datasets

Here are the links to the datasets we used to perform the experiments.

Single Graphs:

Dataset Nodes Edges Triangles
Edit EN Wikibooks 133k 386k 178k
SOC YouTube Growth 3.2M 9.3M 12.3M
Cit US Patents 3.7M 16.5M 7.5M
Actors Collaborations 382k 15M 346M
Stackoverflow 2.5M 28M 114M
SOC Live Journals 4.8M 42.8M 285.7M
Twitter-merged 41M 1.5B 34.6B

Graph Sequences:

This kind of datasets consists in a collection of consecutive graphs registered each in a given time of interval, meaning that the topology betweeen each graph is only slightly varying in terms of nodes and edges (and, consequently triangles).

Dataset Number of Graphs Description Max Nodes Max Edges Max Triangles
Autonomous systems - Oregon-1 9 9 graphs of Autonomous Systems (AS) peering information inferred from Oregon route-views between March 31 2001 and May 26 2001 11k 23k 19k
CAIDA AS Relationships Datasets 122 CAIDA AS graphs, derived from a set of RouteViews BGP table snapshots, from January 2004 to November 2007 26k 106k 36k
Autonomous systems AS-733 733 733 Daily instances which span an interval of 785 days from November 8 1997 to January 2 2000, from the BGP logs 6k 13k 6k
Twitter snapshots 4 4 graphs of the Twitter following/followers network 29.9M 373M 4.4B

Experiments and Results

To reproduce experiments and results of the paper, please refer to the scripts folder.

About

Fast and Accurate Triangle Counting in Graph Streams Using Predictions (ICDM 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published