This repository is the official implementation of RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion.
To install requirements:
conda env create -f environment.yaml
conda activate ringer
pip install -e .
Download and extract the CREMP pickle.tar.gz from here. Use train.csv and test.csv to partition it into training and test data and put the corresponding files into train and test.
To train the full conditional model, run this command:
train conditional.json
The config file can be specified by an absolute path or by a path relative to the configs folder. Similarly, within the config file, data_dir
can be an absolute path or a path relative to the data folder.
To log a training run with Weights & Biases, set up your configuration in configs/wandb/wandb.json and set up logging using:
train conditional.json --wandb-run <run_name>
The pre-trained model is included in this repository.
To generate samples for the CREMP test set, run:
evaluate \
--model-dir assets/models/conditional \
--data-dir cremp/test \
--split-sizes 0.0 0.0 1.0 \
--sample-only
This creates a sample
directory containing samples for all molecules in sample/samples.pickle
.
Run evaluate --help
to see all options available for sampling and evaluation.
The evaluate
command can also be used to reconstruct backbones (not including side chains) and to compute evaluation metrics. However, it is not recommended to do so because evaluate
does not parallelize well across molecules.
Instead, reconstruction (including side chains) is done most effectively for each molecule individually using scripts/reconstruct_single.py. Parallelization can then be efficiently achieved by submitting a batch job array using an HPC job scheduler (e.g., Slurm) and passing the job array index as the first argument to the script. To reconstruct molecule 0, run:
python scripts/reconstruct_single.py 0 \
cremp/test \
sample/samples.pickle \
sample/reconstructed_mols \
assets/models/conditional/training_mean_distances.json
The script will run the optimization to reconstruct the ring coordinates, followed by a linear (NeRF) reconstruction of the side chains using the conformer samples previously generated, and save the resulting molecule in sample/reconstructed_mols
. Note that even though we point the script to cremp/test
, it only uses the atom identities and connectivity information from the test molecules; their geometries are entirely set during the reconstruction procedure.
Run python scripts/reconstruct_single.py --help
for an overview of other parameters available for reconstruction.
As with reconstruction, computing metrics is best done separately for each molecule using scripts/compute_metrics_single.py followed by aggregation across molecules using scripts/aggregate_metrics.py. For example, to compute metrics for the H.A.S.V
macrocycle, run
python scripts/compute_metrics_single.py \
cremp/test/H.A.S.V.pickle \
sample/reconstructed_mols/H.A.S.V.pickle
Run python scripts/compute_metrics_single.py --help
and python scripts/aggregate_metrics.py --help
for an overview of other parameters available for computing metrics.
Install pre-commit hooks to use automated code formatting before committing changes. Make sure you're in the top-level directory and run:
pre-commit install
After that, your code will be automatically reformatted on every new commit.
To manually reformat all files in the project, use:
pre-commit run -a
To update the hooks defined in .pre-commit-config.yaml, use:
pre-commit autoupdate
Licensed under the MIT License. See LICENSE for additional details.
For the code and/or model, please cite:
@misc{grambow2023ringer,
title={{RINGER}: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion},
author={Colin A. Grambow and Hayley Weir and Nathaniel L. Diamant and Alex M. Tseng and Tommaso Biancalani and Gabriele Scalia and Kangway V. Chuang},
year={2023},
eprint={2305.19800},
archivePrefix={arXiv},
primaryClass={q-bio.BM}
}
To cite the CREMP dataset, please use:
@article{grambow2024cremp,
title = {{CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning}},
author = {Grambow, Colin A. and Weir, Hayley and Cunningham, Christian N. and Biancalani, Tommaso and Chuang, Kangway V.},
year = {2024},
journal = {Scientific Data},
doi = {10.1038/s41597-024-03698-y},
pages = {859},
number = {1},
volume = {11}
}
You can also cite the CREMP Zenodo repository directly:
@dataset{grambow_colin_a_2023_7931444,
author = {Grambow, Colin A. and
Weir, Hayley and
Cunningham, Christian N. and
Biancalani, Tommaso and
Chuang, Kangway V.},
title = {{CREMP: Conformer-Rotamer Ensembles of Macrocyclic
Peptides for Machine Learning}},
month = may,
year = 2023,
publisher = {Zenodo},
version = {1.0.1},
doi = {10.5281/zenodo.7931444},
url = {https://doi.org/10.5281/zenodo.7931444}
}