This repo presents implementations of the Subgraph Pooling:
Paper: Tackling Negative Transfer on Graphs
In this paper, we systematically analyze the negative transfer issue in graph neural networks and provide a new insight to handle the issue. In particular, we find: for semantically similar graphs, although structural differences lead to significant distribution shift in node embeddings, their impact on subgraph embeddings is marginal. This insight inspires us to propose Subgraph Pooling and Subgraph Pooling++ that transfer subgraph-level knowledge across graphs.
We use conda for environment setup. Please run the bash as
conda create -y -n SP python=3.8
conda activate SP
pip install -r requirements.txt
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html
Here a conda environment named SP
and install relevant requirements from requirements.txt
. Be sure to activate the environment via conda activate SP
before running experiments as described.
We include six datasets in this paper. Please download from the following links and put them under data/
like
.
|- params
|
|- data
| |- twitch
| |- airports
| |- elliptic
| |- ...
-
Citation Network (
ACM
andDBLP
): Download the.zip
file from here and decompress underdata
folder. More details here. -
Airport Network (
USA
,Brazil
, andEurope
): These datasets will be automatically downloaded when running the code. More details here. -
Twitch Network (
DE
,EN
,ES
,FR
,PT
,RU
): The dataset are collected in different countries, which have varying sizes and distributions. These graphs will be automatically downloaded when running the code. More details here. -
OGB-Arxiv: This is a citation network containing papers published from 2005 to 2020. The dataset has two settings, the first is to evaluate the temporal distribution shift (which will be downloaded automatically), and the second is to evaluate the degree shift (with dataset here). Please decompress the file under
data
folder. More details are presented in OGB Benchmark and GOOD. -
Elliptic Network: A Bitcoin transaction network. Download here and decompress under
data
folder. More details here. -
Facebook100 Network: The dataset contains 14 graphs collected from Facebook. Please download here and decompress under
data
folder. More details here.
To quickly run the model, please run main.py
by specifying the experiment setting. Here is an example.
python main.py --use_params --source_target acm_dblp --backbone gcn --sampling k_hop --ft_last_layer
Note that we provide three transfer learning settings, including freeze
(do not fine-tune on the target graph), ft_last_layer
(fine-tune the last layer), and ft_whole_model
(fine-tune the whole model). You can jointly run these settings, like
python main.py --use_params --source_target acm_dblp --freeze --ft_last_layer --ft_whole_model
In the following, we give a simpler method for running the code.
To ensure reproducibility of the paper, we provide the detailed hyper-parameters under params
folder. One simple method is to run the bash script under script
folder, like
bash script/run.sh DATASET BACKBONE SAMPLING
where the first term indicates the dataset
, the second term indicates the used backbone
, and the last one is the subgraph sampling
method. Please refer to script/readme.md for more details.
Here are some examples.
bash script/run.sh acm_dblp gcn k_hop
bash script/run.sh dblp_acm gcn k_hop
bash script/run.sh arxiv_1_arxiv_5 gcn rw
bash script/run.sh arxiv_3_arxiv_5 gcn rw
To extend Subgraph Pooling to your own model, one simple method is to implement your own model under model.py
.
Please open an issue or contact zwang43@nd.edu
if you have questions.
Please cite the following paper corresponding to the repository:
@article{wang2024tackling,
title={Tackling Negative Transfer on Graphs},
author={Wang, Zehong and Zhang, Zheyuan and Zhang, Chuxu and Ye, Yanfang},
journal={arXiv preprint arXiv:2402.08907},
year={2024}
}