Homepage | BAAI link | Documentation | Poster | 中文
CogDL is a graph representation learning toolkit that allows researchers and developers to easily train and compare baseline or custom models for node classification, link prediction and other tasks on graphs. It provides implementations of many popular models, including: non-GNN Baselines like Deepwalk, LINE, NetMF, GNN Baselines like GCN, GAT, GraphSAGE.
Note that CogDL is still actively under development, so feedback and contributions are welcome. Feel free to submit your contributions as a pull request.
CogDL features:
- Task-Oriented: CogDL focuses on tasks on graphs and provides corresponding models, datasets, and leaderboards.
- Easy-Running: CogDL supports running multiple experiments simultaneously on multiple models and datasets under a specific task using multiple GPUs.
- Multiple Tasks: CogDL supports node classification and link prediction tasks on homogeneous/heterogeneous networks, as well as graph classification.
- Extensibility: You can easily add new datasets, models and tasks and conduct experiments for them!
- Supported tasks:
- Node classification
- Link prediction
- Graph classification
- Graph pre-training
- Graph clustering
- Graph similarity search
-
The new v0.1.2 release includes a pre-training task, many examples, OGB datasets, some knowledge graph embedding methods, and some graph neural network models. The coverage of CogDL is increased to 80%. Some new APIs, such as
Trainer
andSampler
, are developed and being tested. -
The new v0.1.1 release includes the knowledge link prediction task, many state-of-the-art models, and
optuna
support. We also have a Chinese WeChat post about the CogDL release.
- Python version >= 3.6
- PyTorch version >= 1.0.0
- PyTorch Geometric (recommended)
- Deep Graph Library (optional)
Please follow the instructions here to install PyTorch: https://github.com/pytorch/pytorch#installation, PyTorch Geometric https://github.com/rusty1s/pytorch_geometric/#installation and Deep Graph Library https://docs.dgl.ai/install/index.html.
Install cogdl with other dependencies:
pip install cogdl
If you want to experiment with the latest CogDL features which did not get released yet, you can install CogDL via:
git clone git@github.com:THUDM/cogdl.git
cd cogdl
pip install -e .
You might also opt to use a Docker container. There is an image available in this repo that you can build with the Torch and CUDA versions available in your system. To build the docker image just run:
docker build --build-arg CUDA=YOUR_CUDA_VERSION --build-arg TORCH=YOUR_TORCH_VERSION --tag cogdl .
Where YOUR_CUDA_VERSION
should be cuxxx representing your cuda version (or just cpu) and YOUR_TORCH_VERSION
should be the version of PyTorch you want to use. For example, to run with CUDA 10.1 and PyTorch 1.7.0 you can run:
docker build --build-arg CUDA=cu101 --build-arg TORCH=1.7.0 --tag cogdl .
Then you can start the container by running:
docker run -it -v cogdl:/cogdl cogdl /bin/bash
And then clone your fork or this repository into the cogdl folder:
git clone https://github.com/THUDM/cogdl /cogdl
Note: if you install a version of torch different from 1.7.0, there might be some problems with the libraries torchvision and torchaudio. You might have to reinstall them by hand.
You can run all kinds of experiments through CogDL APIs, including: build_dataset
, build_model
, and build_task
. You can also use your own datasets and models for experiments. Some examples are provided in the examples/, including gcn.py.
# Set hyper-parameters for experiments
args = get_default_args()
args.task = 'node_classification'
args.dataset = 'cora'
args.model = 'gcn'
# Set datasets
dataset = build_dataset(args)
args.num_features = dataset.num_features
args.num_classes = dataset.num_classes
args.num_layers = 2
# Build models
model = build_model(args)
# Train and evaluate models
task = build_task(args, dataset=dataset, model=model)
ret = task.train()
You can use python scripts/train.py --task example_task --dataset example_dataset --model example_method
to run example_method on example_data and evaluate it via example_task.
- --task, downstream tasks to evaluate representation like node_classification, unsupervised_node_classification, link_prediction. More tasks can be found in the cogdl/tasks.
- --dataset, dataset name to run, can be a list of datasets with space like
cora citeseer ppi
. Supported datasets include 'cora', 'citeseer', 'pumbed', 'PPI', 'wikipedia', 'blogcatalog', 'flickr'. More datasets can be found in the cogdl/datasets. - --model, model name to run, can be a list of models like
deepwalk line prone
. Supported models include 'gcn', 'gat', 'graphsage', 'deepwalk', 'node2vec', 'hope', 'grarep', 'netmf', 'netsmf', 'prone'. More models can be found in the cogdl/models.
For example, if you want to run Deepwalk, Line, Netmf on Wikipedia with node classification task, with 5 different seeds:
$ python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model line netmf --seed 0 1 2 3 4
Expected output:
Variant | Micro-F1 0.1 | Micro-F1 0.3 | Micro-F1 0.5 | Micro-F1 0.7 | Micro-F1 0.9 |
---|---|---|---|---|---|
('wikipedia', 'line') | 0.4069±0.0011 | 0.4071±0.0010 | 0.4055±0.0013 | 0.4054±0.0020 | 0.4080±0.0042 |
('wikipedia', 'netmf') | 0.4551±0.0024 | 0.4932±0.0022 | 0.5046±0.0017 | 0.5084±0.0057 | 0.5125±0.0035 |
If you want to run parallel experiments on your server with multiple GPUs on multiple models gcn, gat on multiple datasets Cora, Citeseer with node classification task:
$ python scripts/parallel_train.py --task node_classification --dataset cora --model gcn gat --device-id 0 1 --seed 0 1 2 3 4
Expected output:
Variant | Acc |
---|---|
('cora', 'gcn') | 0.8236±0.0033 |
('cora', 'gat') | 0.8262±0.0032 |
You can use the following command to create the necessary files for your model via our CLI.
$ python scripts/model_maker.py
We summarize the characteristics of all methods for different tasks in the following, where reproducibility means whether the model is reproduced in our experimental setting currently.
Algorithm | Directed | Weight | Shallow network | Matrix factorization | Sampling | Reproducibility | GPU support |
---|---|---|---|---|---|---|---|
DeepWalk | ✔️ | ✔️ | |||||
LINE | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
Node2vec | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
SDNE | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
DNGR | ✔️ | ✔️ | ✔️ | ✔️ | |||
HOPE | ✔️ | ✔️ | ✔️ | ✔️ | |||
GraRep | ✔️ | ✔️ | ✔️ | ||||
NetMF | ✔️ | ✔️ | ✔️ | ✔️ | |||
NetSMF | ✔️ | ✔️ | ✔️ | ✔️ | |||
ProNE | ✔️ | ✔️ | ✔️ | ✔️ |
Algorithm | Weight | Sampling | Attention | Inductive | Reproducibility | GPU support |
---|---|---|---|---|---|---|
Graph U-Net | ✔️ | ✔️ | ✔️ | ✔️ | ||
MixHop | ✔️ | ✔️ | ✔️ | |||
Dr-GAT | ✔️ | ✔️ | ✔️ | ✔️ | ||
GAT | ✔️ | ✔️ | ✔️ | ✔️ | ||
DGI | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
GCN | ✔️ | ✔️ | ✔️ | ✔️ | ||
GraphSAGE | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
Chebyshev | ✔️ | ✔️ | ✔️ | ✔️ | ||
GRAND | ✔️ | ✔️ | ✔️ | |||
GCNII | ✔️ | ✔️ | ✔️ | ✔️ | ||
DeeperGCN | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
MVGRL | ✔️ | ✔️ | ✔️ | ✔️ | ||
GraphMix | ✔️ | ✔️ | ✔️ | |||
DisenGCN | ✔️ | ✔️ | ✔️ | |||
PPNP/APPNP | ✔️ | ✔️ | ✔️ | ✔️ |
Algorithm | Multi-Node | Multi-Edge | Attribute | Supervised | MetaPath | Reproducibility | GPU support |
---|---|---|---|---|---|---|---|
GATNE | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
Metapath2vec | ✔️ | ✔️ | ✔️ | ||||
PTE | ✔️ | ✔️ | |||||
Hin2vec | ✔️ | ✔️ | ✔️ | ✔️ | |||
GTN | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
HAN | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Algorithm | Node feature | Unsupervised | Graph kernel | Shallow network | Reproducibility | GPU support |
---|---|---|---|---|---|---|
Infograph | ✔️ | ✔️ | ✔️ | ✔️ | ||
Diffpool | ✔️ | ✔️ | ✔️ | |||
Graph2Vec | ✔️ | ✔️ | ✔️ | ✔️ | ||
Sortpool | ✔️ | ✔️ | ✔️ | |||
GIN | ✔️ | ✔️ | ✔️ | |||
PATCHY_SAN | ✔️ | ✔️ | ✔️ | ✔️ | ||
DGCNN | ✔️ | ✔️ | ✔️ | |||
DGK | ✔️ | ✔️ | ✔️ | |||
HGP-SL | ✔️ | ✔️ | ✔️ | ✔️ | ||
SAGPool | ✔️ | ✔️ | ✔️ |
CogDL provides several downstream tasks including node classification (with or without node attributes), link prediction (with or without attributes, heterogeneous or not). These leaderboards maintain state-of-the-art results and benchmarks on these tasks.
This leaderboard reports unsupervised multi-label node classification setting. we run all algorithms on several real-world datasets and report the sorted experimental results (Micro-F1 score with 90% labels as training data in L2 normalization logistic regression).
Rank | Method | PPI | Blogcatalog | Wikipedia |
---|---|---|---|---|
1 | ProNE (Zhang et al, IJCAI'19) | 26.32 | 43.63 | 57.64 |
2 | NetMF (Qiu et al, WSDM'18) | 24.86 | 43.49 | 58.46 |
3 | Node2vec (Grover et al, KDD'16) | 22.97 | 42.29 | 56.00 |
4 | NetSMF (Qiu et at, WWW'19) | 24.39 | 43.21 | 51.42 |
5 | LINE (Tang et al, WWW'15) | 23.20 | 39.21 | 52.99 |
6 | DeepWalk (Perozzi et al, KDD'14) | 22.59 | 42.69 | 51.38 |
7 | Spectral (Tang et al, Data Min Knowl Disc (2011)) | 23.33 | 42.40 | 50.33 |
8 | Hope (Ou et al, KDD'16) | 22.94 | 34.82 | 55.43 |
9 | SDNE (Wang et al, KDD'16) | 20.14 | 40.32 | 48.24 |
10 | GraRep (Cao et al, CIKM'15) | 22.03 | 33.99 | 55.59 |
11 | DNGR (Cao et al, AAAI'16) | 16.45 | 28.54 | 48.57 |
This leaderboard reports the semi-supervised node classification under a transductive setting including several popular graph neural network methods.
Rank | Method | Cora | Citeseer | Pubmed |
---|---|---|---|---|
1 | Grand(Feng et al., NIPS'20) | 84.8 ± 0.3 | 75.1 ± 0.3 | 82.4 ± 0.4 |
2 | GCNII(Chen et al., ICML'20) | 85.1± 0.3 | 71.3 ± 0.4 | 80.2 ± 0.3 |
3 | DR-GAT (Zou et al., 2019) | 83.6 ± 0.5 | 72.8 ± 0.8 | 79.1 ± 0.3 |
4 | MVGRL (Hassani et al., KDD'20) | 83.6 ± 0.2 | 73.0 ± 0.3 | 80.1 ± 0.7 |
5 | APPNP (Klicpera et al., ICLR'19) | 82.5 ± 0.8 | 71.2 ± 0.2 | 80.2 ± 0.2 |
6 | GAT (Veličković et al., ICLR'18) | 82.9 ± 0.8 | 71.0 ± 0.3 | 78.9 ± 0.3 |
7 | GCN (Kipf et al., ICLR'17) | 82.3 ± 0.3 | 71.4 ± 0.4 | 79.5 ± 0.2 |
8 | SRGCN | 82.2 ± 0.2 | 72.8 ± 0.2 | 79.0 ± 0.4 |
9 | DGI (Veličković et al., ICLR'19) | 82.0 ± 0.2 | 71.2 ± 0.4 | 76.5 ± 0.6 |
10 | GraphSAGE (Hamilton et al., NeurIPS'17) | 80.1 ± 0.2 | 66.2 ± 0.4 | 77.2 ± 0.7 |
11 | GraphSAGE(unsup)(Hamilton et al., NeurIPS'17) | 78.2 ± 0.9 | 65.8 ± 1.0 | 78.2 ± 0.7 |
12 | Chebyshev (Defferrard et al., NeurIPS'16) | 79.0 ± 1.0 | 69.8 ± 0.5 | 68.6 ± 1.0 |
13 | Graph U-Net (Gao et al., 2019) | 81.8 | 67.1 | 77.3 |
14 | MixHop (Abu-El-Haija et al., ICML'19) | 81.9 ± 0.4 | 71.4 ± 0.8 | 80.8 ± 0.6 |
15 | SGC-PN (Zhao & Akoglu, 2019) | 76.4 ± 0.3 | 64.6 ± 0.6 | 79.6 ± 0.3 |
For multiplex node classification, we use macro F1 to evaluate models. We evaluate all models under the setting and datasets of GTN.
Rank | Method | DBLP | ACM | IMDB |
---|---|---|---|---|
1 | GTN (Yun et al, NeurIPS'19) | 92.03 | 90.85 | 59.24 |
2 | HAN (Xiao et al, WWW'19) | 91.21 | 87.25 | 53.94 |
3 | GCC (Qiu et al, KDD'20) | 79.42 | 86.82 | 55.86 |
4 | PTE (Tang et al, KDD'15) | 78.65 | 87.44 | 48.91 |
5 | Metapath2vec (Dong et al, KDD'17) | 75.18 | 88.79 | 43.10 |
6 | Hin2vec (Fu et al, CIKM'17) | 74.31 | 84.66 | 44.04 |
For link prediction, we adopt Area Under the Receiver Operating Characteristic Curve (ROC AUC), which represents the probability that vertices in a random unobserved link are more similar than those in a random nonexistent link. We evaluate these measures while removing 10 percents of edges on these dataset. We repeat our experiments for 10 times and report the results in order.
Rank | Method | PPI | Wikipedia |
---|---|---|---|
1 | ProNE (Zhang et al, IJCAI'19) | 79.93 | 82.74 |
2 | NetMF (Qiu et al, WSDM'18) | 79.04 | 73.24 |
3 | Hope (Ou et al, KDD'16) | 80.21 | 68.89 |
4 | LINE (Tang et al, WWW'15) | 73.75 | 66.51 |
5 | Node2vec (Grover et al, KDD'16) | 70.19 | 66.60 |
6 | NetSMF (Qiu et at, WWW'19) | 68.64 | 67.52 |
7 | DeepWalk (Perozzi et al, KDD'14) | 69.65 | 65.93 |
8 | SDNE (Wang et al, KDD'16) | 54.87 | 60.72 |
For multiplex link prediction, we adopt Area Under the Receiver Operating Characteristic Curve (ROC AUC). We evaluate these measures while removing 15 percents of edges on these dataset. We repeat our experiments for 10 times and report the three matrices in order.
Rank | Method | Amazon | YouTube | |
---|---|---|---|---|
1 | GATNE (Cen et al, KDD'19) | 97.44 | 84.61 | 92.30 |
2 | NetMF (Qiu et al, WSDM'18) | 97.72 | 82.53 | 73.75 |
3 | ProNE (Zhang et al, IJCAI'19) | 96.51 | 78.96 | 81.32 |
4 | Node2vec (Grover et al, KDD'16) | 86.86 | 74.01 | 78.30 |
5 | DeepWalk (Perozzi et al, KDD'14) | 92.54 | 74.31 | 60.29 |
6 | LINE (Tang et al, WWW'15) | 92.56 | 73.40 | 60.36 |
7 | Hope (Ou et al, KDD'16) | 94.39 | 74.66 | 70.61 |
8 | GraRep (Cao et al, CIKM'15) | 83.88 | 71.37 | 49.64 |
This leaderboard reports the performance of graph classification methods. we run all algorithms on several datasets and report the sorted experimental results.
Rank | Method | MUTAG | IMDB-B | IMDB-M | PROTEINS | COLLAB |
---|---|---|---|---|---|---|
1 | GIN (Xu et al, ICLR'19) | 92.06 | 76.10 | 51.80 | 75.19 | 79.52 |
2 | Infograph (Sun et al, ICLR'20) | 88.95 | 74.50 | 51.33 | 73.93 | 79.4 |
3 | DiffPool (Ying et al, NeuIPS'18) | 85.18 | 72.40 | 50.50 | 75.30 | 79.27 |
4 | SortPool (Zhang et al, AAAI'18) | 87.25 | 75.40 | 50.47 | 73.23 | 80.07 |
5 | Graph2Vec (Narayanan et al, CoRR'17) | 83.68 | 73.90 | 52.27 | 73.30 | 85.58 |
6 | PATCH_SAN (Niepert et al, ICML'16) | 86.12 | 76.00 | 46.40 | 75.38 | 74.34 |
7 | HGP-SL (Zhang et al, AAAI'20) | 81.93 | 74.00 | 49.53 | 73.94 | 82.08 |
8 | DGCNN (Wang et al, ACM Transactions on Graphics'17) | 83.33 | 69.50 | 46.33 | 66.67 | 77.45 |
9 | SAGPool (J. Lee, ICML'19) | 55.55 | 63.00 | 51.33 | 72.59 | / |
10 | DGK (Yanardag et al, KDD'15) | 83.68 | 55.00 | 40.40 | 72.59 | / |
If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.
If you have a unique and interesting dataset and are willing to publish it, you can submit your dataset via opening an issue in our repository, we will run all suitable methods on your dataset and update our leaderboard.
If you have a well-performed algorithm and are willing to implement it in our toolkit to help more people, you can create a pull request, detailed information can be found here.
Before committing your modification, please first run pre-commit install
to setup the git hook for checking code format and style using black
and flake8
. Then the pre-commit
will run automatically on git commit
! Detailed information of pre-commit
can be found here.
To have a successful pull request, you need to have at least (1) your model script and (2) a unit test.
You might be confused why your pull request was rejected because of 'Coverage decreased ...' issue even though your model is working fine locally. This is because you have not included a unit test, which essentially runs through the extra lines of code you added. The Travis CI service used by Github conducts all unit tests on the code you committed and checks how many lines of the code have been checked by the unit tests, and if a significant portion of your code has not been checked (insufficient coverage), the pull request is rejected.
So how do you do a unit test? Let's say you implement a GNN model in a script models/nn/abcgnn.py that does the task of node classification. Then, you need to add a unit test inside the script tests/tasks/test_node_classification.py (or whatever relevant task your model does). To add the unit test, you simply add a function test_abcgnn_cora() (just follow the format of the other unit tests already in the script), fill it with required arguments and the last line in the function 'assert 0 <= ret["Acc"] <= 1' is the very basic sanity check conducted by the unit test. Then, in the main section, remember to call your test_abcgnn_cora() function. After modifying tests/tasks/test_node_classification.py, commit it together with your models/nn/abcgnn.py and your pull request should pass.
It is also a good idea to include an example script examples/gnn_models/abcgnn.py to show how your model can be run with appropriate arguments.