This repository is the official implementation of Node Feature Kernels Increase Graph Convolutional Network Robustness.
It is mainly developed with help of the library Pytorch Geometric. We also thank Open Graph Benchmark implementation for providing an example of logger.py
.
A virtual environment can be created by conda
with the given environments file,
conda env create -f environments.yml
Notice that Pytorch Geometric needs to be installed separately via Pip
, as
conda activate RobustGCN
pip install -r requirements.txt
This implementation is able to reproduce experiment results shown in our paper which studies the robustness of Graph Convolutional Networks (GCNs) under structural perturbation, including
- Asymptotic behaviour of random feature GCN with growing hidden dimension.
- GCN with separate noise (modelled by a random graph) added in the message-passing step.
- GCN with noise merged with its original adjacency matrix, e.g. edge deletion/insertion.
- Experiments on multi-layer GCN.
- GCN with perturbed adjacency as well as node feature perturbation.
- Experiments with randomised train/test/valid splits.
To run experiments with Vanilla GCN
, e.g. on Cora, do
python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type none --no-merged --epsilon 1.0
For random feature GCN
, do
python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type random-feature --noise_type none --epsilon 1.0 --hiddim 3000
For experiments in theoretical case (separate noise)
, do
python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type random --no-merged --noise_ratio 1.0 --epsilon 0.5 --identity
For experiments in realistic case (merged noise)
, do
python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type deletion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom
in the case of Edge Deletion
. And do
python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type insertion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom
in the case of Edge Insertion
.
For deeper architecture
, simply change the parameter of num_layer
.
For experiments in node feature noise
, e.g. realistic case, do
python main.py --dataset Cora --out_dir <out dir> --num_layers 1 --readout mlp --exp_type robust --noise_type insertion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom --add_feat_noise --feat_noise_ratio 1.0
For experiments with multiple splits
(see supplementary material Section E), e.g. realistic case, do
python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type insertion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom --splits 10 --split_type random
Description of important Model Options:
--hiddim <int>
dimension of the hidden representation of node embedding, default is 128
--num_layer <int>
number of stacked GCN layers, default is 1
--readout <str>
choice of readout functions that output prediction score, default is 'mlp'
--exp_type <str>
choice of different experiment settings, default is 'random-feature'
--noise_type <str>
choice of different (noise) scenario, default is 'none'
--merged <bool>
whether to merge noise into the original adjacency matrix, default is False
--add_feat_noise <bool>
whether to add gaussian noise on the features, default is False
--add_kernel <bool>
whether to enhance GCN message-passing with kernel, default is False
--random_noise_type <str>
choice of random graph generative model modelling the noise, default is Erdos-Renyi graph
--kernel_type <str>
choice of kernel function, default is 'linear'
--noise_ratio <float>
ratio between the random noise graph's density and the original graph's density, default is 1.0
--feat_noise_ratio <float>
ratio between the standard deviation of the added gaussian noise and the original node features, default is 1.0
--standarize <bool>
whether to standarize node features, default is False
--centerize <bool>
whether to centerize kernel values, default is False
--add_identity <bool>
whether to add self-loops to noise/kernel, default is False
--normalize <bool>
whether to degree normalize noise/kernel, default is False
--rf_norm <bool>
whether to normalize random weights in random feature GCN, default is False
--split_type <str>
choice of train/valid/test split of datasets, default is 'public'
--nystrom <bool>
whether to use nystrom approximation for computing kernel, default is False
--epsilon <float>
coefficient of the propagation following original graph in the GCN message-passing step, default is 1.0
Table 1: Performance of GCN/GIN/GraphSage/GAT with node-feature kernel under perturbation on Cora
deletion | insertion | GCN | GCN-k | GIN | GIN-k | SAGE | SAGE-k | GAT | GAT-k |
---|---|---|---|---|---|---|---|---|---|
0.0 | 0.0 | 76.42 ± 1.55 | 75.42 ± 1.65 | 76.94 ± 1.41 | 77.62 ± 1.74 | 74.77 ± 1.98 | 76.00 ± 2.05 | 76.55 ± 2.23 | 77.45 ± 2.00 |
0.5 | 0.0 | 71.46 ± 1.66 | 69.00 ± 2.99 | 70.42 ± 2.03 | 70.23 ± 1.73 | 67.37 ± 1.73 | 70.46 ± 1.86 | 70.86 ± 1.45 | 71.35 ± 1.90 |
0.0 | 1.0 | 60.73 ± 2.20 | 70.55 ± 1.52 | 63.87 ± 2.85 | 67.80 ± 2.27 | 66.53 ± 1.80 | 68.52 ± 1.97 | 59.25 ± 1.99 | 64.92 ± 1.55 |
0.5 | 0.5 | 53.90 ± 1.88 | 63.79 ± 2.26 | 56.36 ± 2.23 | 62.79 ± 1.56 | 62.06 ± 1.73 | 63.80 ± 2.54 | 52.78 ± 2.37 | 58.01 ± 1.96 |
0.5 | 1.0 | 45.04 ± 2.46 | 62.08 ± 2.30 | 49.56 ± 3.40 | 55.24 ± 2.13 | 59.54 ± 1.75 | 62.15 ± 2.32 | 43.97 ± 2.29 | 52.47 ± 1.52 |
In the above table, each row corresponds to one perturbation scenario, where edges are randomly removed and/or added which is controlled by the two parameters: “deletion” and “insertion”, which correspond to the ratio of edges (w.r.t the original number of edges in the original graph) deleted/inserted from/to the graph. For example, the scenario (0.0, 0.0) corresponds to the unperturbed case and (0.5, 0.5) corresponds to the case where 50% of the original edges are removed and a same number of edges non-existing in the original graph are added.
Each column corresponds to a GNN model we considered. The appendage "-k" in the model name identifies that the model contains our proposed node-feature kernel. Each model is composed of a single message passing layer and a MLP readout layer. For all "-k" models, the coefficient of the perturbed graph propagation, i.e.,
- Mohamed El Amine Seddik
- Changmin Wu
- Johannes F. Lutzeyer
- Michalis Vazirgiannis
If you find our repo useful, please cite:
@misc{seddik2021node,
title={Node Feature Kernels Increase Graph Convolutional Network Robustness},
author={Mohamed El Amine Seddik and Changmin Wu and Johannes F. Lutzeyer and Michalis Vazirgiannis},
year={2021},
eprint={2109.01785},
archivePrefix={arXiv},
primaryClass={cs.LG}
}