Skip to content

Latest commit

 

History

History
159 lines (113 loc) · 5.78 KB

README.md

File metadata and controls

159 lines (113 loc) · 5.78 KB

RINGER

This repository is the official implementation of RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion.

cover

Requirements

To install requirements:

conda env create -f environment.yaml
conda activate ringer
pip install -e .

Data

Download and extract the CREMP pickle.tar.gz from here. Use train.csv and test.csv to partition it into training and test data and put the corresponding files into train and test.

Training

To train the full conditional model, run this command:

train conditional.json

The config file can be specified by an absolute path or by a path relative to the configs folder. Similarly, within the config file, data_dir can be an absolute path or a path relative to the data folder.

To log a training run with Weights & Biases, set up your configuration in configs/wandb/wandb.json and set up logging using:

train conditional.json --wandb-run <run_name>

Sampling

The pre-trained model is included in this repository.

To generate samples for the CREMP test set, run:

evaluate \
    --model-dir assets/models/conditional \
    --data-dir cremp/test \
    --split-sizes 0.0 0.0 1.0 \
    --sample-only

This creates a sample directory containing samples for all molecules in sample/samples.pickle.

Run evaluate --help to see all options available for sampling and evaluation.

Reconstruction

The evaluate command can also be used to reconstruct backbones (not including side chains) and to compute evaluation metrics. However, it is not recommended to do so because evaluate does not parallelize well across molecules.

Instead, reconstruction (including side chains) is done most effectively for each molecule individually using scripts/reconstruct_single.py. Parallelization can then be efficiently achieved by submitting a batch job array using an HPC job scheduler (e.g., Slurm) and passing the job array index as the first argument to the script. To reconstruct molecule 0, run:

python scripts/reconstruct_single.py 0 \
    cremp/test \
    sample/samples.pickle \
    sample/reconstructed_mols \
    assets/models/conditional/training_mean_distances.json

The script will run the optimization to reconstruct the ring coordinates, followed by a linear (NeRF) reconstruction of the side chains using the conformer samples previously generated, and save the resulting molecule in sample/reconstructed_mols. Note that even though we point the script to cremp/test, it only uses the atom identities and connectivity information from the test molecules; their geometries are entirely set during the reconstruction procedure.

Run python scripts/reconstruct_single.py --help for an overview of other parameters available for reconstruction.

Evaluation

As with reconstruction, computing metrics is best done separately for each molecule using scripts/compute_metrics_single.py followed by aggregation across molecules using scripts/aggregate_metrics.py. For example, to compute metrics for the H.A.S.V macrocycle, run

python scripts/compute_metrics_single.py \
    cremp/test/H.A.S.V.pickle \
    sample/reconstructed_mols/H.A.S.V.pickle

Run python scripts/compute_metrics_single.py --help and python scripts/aggregate_metrics.py --help for an overview of other parameters available for computing metrics.

Contributing

Install pre-commit hooks to use automated code formatting before committing changes. Make sure you're in the top-level directory and run:

pre-commit install

After that, your code will be automatically reformatted on every new commit.

To manually reformat all files in the project, use:

pre-commit run -a

To update the hooks defined in .pre-commit-config.yaml, use:

pre-commit autoupdate

License

Licensed under the MIT License. See LICENSE for additional details.

Citations

For the code and/or model, please cite:

@misc{grambow2023ringer,
    title={{RINGER}: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion}, 
    author={Colin A. Grambow and Hayley Weir and Nathaniel L. Diamant and Alex M. Tseng and Tommaso Biancalani and Gabriele Scalia and Kangway V. Chuang},
    year={2023},
    eprint={2305.19800},
    archivePrefix={arXiv},
    primaryClass={q-bio.BM}
}

To cite the CREMP dataset, please use:

@article{grambow2024cremp,
    title = {{CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning}},
    author = {Grambow, Colin A. and Weir, Hayley and Cunningham, Christian N. and Biancalani, Tommaso and Chuang, Kangway V.},
    year = {2024},
    journal = {Scientific Data},
    doi = {10.1038/s41597-024-03698-y},
    pages = {859},
    number = {1},
    volume = {11}
}

You can also cite the CREMP Zenodo repository directly:

@dataset{grambow_colin_a_2023_7931444,
  author       = {Grambow, Colin A. and
                  Weir, Hayley and
                  Cunningham, Christian N. and
                  Biancalani, Tommaso and
                  Chuang, Kangway V.},
  title        = {{CREMP: Conformer-Rotamer Ensembles of Macrocyclic 
                   Peptides for Machine Learning}},
  month        = may,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {1.0.1},
  doi          = {10.5281/zenodo.7931444},
  url          = {https://doi.org/10.5281/zenodo.7931444}
}