This repository provides a reference implementation of Temporal SIR-GN as described in the paper:
Temporal SIR-GN: Efficient and Effective Structural Representation Learning for Temporal Graphs
Janet Layne, Justin Carpenter, Edoardo Serra, and Francesco Gullo.
The Temporal SIR-GN algorithm generates temporal structural node representations for a dynamic graph. These representations capture the evolution of each node's structure over time. Temporal SIR-GN has demonstrated superior scalability and performance on a diversity of datasets compared to existing state of the art algorithms.
If you find Temporal SIR-GN useful for your research, please cite the following paper:
@article{DBLP:journals/pvldb/LayneCSG23,
author = {Layne, Janet, Carpenter, Justin, Serra, Edoardo, and Gullo, Francesco},
title = {Temporal SIR-GN: Efficient and Effective Structural Representation Learning for Temporal Graphs},
journal = {Proc. {VLDB} Endow.},
volume = {16},
number = {9},
pages = {2075--2089},
year = {2023}
}
Required packages: pandas, numpy, scikitlearn, scipy
pip install pandas numpy scipy scikit-learn
Alternatively, create an environment with these requirements:
conda create -n my-env
conda activate my-env
# If you want to install from conda-forge
conda config --env --add channels conda-forge
# The actual install command
conda install pandas numpy scipy scikit-learn
From the command line:
python temporalSirgn.py --input --output --stop --depth --alpha --clusters
For example:
python temporalSirgn.py --input filename --output filename --stop --depth 5 --alpha 10 --clusters 10
From the command line:
python directed_temporalSirgn.py --input --output --stop --depth --alpha --clusters
For example:
python directed_temporalSirgn.py --input filename --output filename --stop --depth 5 --alpha 10 --clusters 10
Temporal SIR-GN takes in a comma separated edgelist (with header) in the form of
nodeID1, nodeID2, timestamp
Output will be a comma separated text file of shape n x k=(c^2+c) for an (undirected) graph with n vertices (double for directed graphs), where c is the number of clusters chosen, formatted as follows:
nodeID, dim0, dim1, dim2,...,dimk
Undirected datasets do not have a reverse edge, however, the preprocessing from loader.py generates an adjacency list with a reverse edge. Datasets are of the form:
nodeID1, nodeID2, timestamp
with header:
src, trg, time
The table below gives the recommended hyperparameters for each dataset used in node classification tasks
Dataset | Alpha | Clusters | Depth |
---|---|---|---|
synth_0.0 | 10 | 10 | convergence (stop = True) |
synth_0.1 | 10 | 10 | convergence (stop = True) |
synth_0.2 | 10 | 10 | convergence (stop = True) |
synth_0.3 | 10 | 10 | convergence (stop = True) |
BrazilAir | 1 | 10 | convergence (stop = True) |
EUAir | 10 | 10 | convergence (stop = True) |
USAir | 10 | 10 | convergence (stop = True) |
DPPIN Tarrasov | 1E4 | 10 | convergence (stop = True) |
High School | 1E-8 | 10 | convergence (stop = True) |
Hospital | 1E5 | 10 | convergence (stop = True) |
Bitcoin OTC (directed) | 1E6 | 11 | convergence (stop = True) |
GDELT | 1 | 10 | convergence (stop = True) |
Note that BrazilAir, EUAir, USAir, and AS are too large for github, and can be found in the following public Google Drive:
https://drive.google.com/drive/folders/1a5uI6lIEBR3oUUU586ZQM-nJCb8pZkpg?usp=sharing
For the extremely large GDELT dataset, we refer to the AWS S3 bucket download instructions available here: https://github.com/amazon-science/tgl