Reference implementation of synthetic coordinates and directional message passing for multiple GNNs, as proposed in
Directional Message Passing on Molecular Graphs via Synthetic Coordinates
by Johannes Gasteiger, Chandan Yeshwanth, Stephan Günnemann
Published at NeurIPS 2021.
Note that the author's name has changed from Johannes Klicpera to Johannes Gasteiger.
The deepergcn_smp
folder contains DeeperGCN and SMP implementations in PyTorch, and
dimenetpp
contains the TensorFlow implementation of DimeNet++ with synthetic coordinates.
We use separate Anaconda environments for DeeperGCN/SMP and DimeNet++. Use this command to create the respective environments for each folder
conda env create -f environment.yml
We use the ogbg-molhiv
and ZINC
datasets from PyTorch Geometric,
which are automatically downloaded to the data
folder. The QM9 dataset is provided in the data
folder.
Reference training scripts with the best hyperparameters are included.
You can select the model (deepergcn
or smp
) and the dataset (ogbg-molhiv
, QM9
, ZINC
) in the script.
We provide reference training scripts in the scripts
folder for:
- the baseline model:
python scripts/train_baseline.py
- baseline model with distance: bounds matrix (BM) or PPR:
python scripts/train_sc_basic.py
- and linegraph with distance and angle using both BM and PPR:
python scripts/train_sc_linegraph.py
These can be modified to perform other ablations, such as choosing any one of the distance methods, or using only the distance on the linegraph.
The model hyperparameters and ablations can be configured in the training script.
python run.py
Alternately, use the config file to train with SEML on a Slurm cluster.
seml <collection> add configs/graph_clsreg.yml
seml <collection> start
The model is evaluated on the validation set during training, and the final test
score is printed at the end of training. Logs with losses and metrics are written to Tensorboard,
the unique experiment ID is printed to console as well as written to the SEML
database. You can also use the results.ipynb
notebook to fetch results from the
SEML Database. Set the collection name and batch IDs in the notebook and run
to fetch the required results.
Checkpoints are saved to a uniquely named folder. This unique name is printed
during training and can be used in the predict.ipynb
notebook to run
on the test set. The model configuration used during training must be specified in config_pp.yaml
.
The same unique name can be used to view losses and metrics in Tensorboard.
Our models achieve the following results (as reported in the paper)
Model | MAE |
---|---|
DeeperGCN | 0.142 +-0.006 |
SMP | 0.109 +-0.004 |
Model | Target | MAE (meV) |
---|---|---|
DimeNet++ | U0 | 28.7 |
DimeNet++ | HOMO | 61.7 |
Please contact j.gasteiger@in.tum.de if you have any questions.
Please cite our paper if you use our method or code in your own work:
@inproceedings{gasteiger_2021_dmp,
title={Directional Message Passing on Molecular Graphs via Synthetic Coordinates},
author={Gasteiger, Johannes and Yeshwanth, Chandan and G{\"u}nnemann, Stephan},
booktitle = {Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS)},
year={2021},
}
Hippocratic License v2.1