Skip to content

bzho3923/ProtLGN

Repository files navigation

ProtLGN

Protein Engineering with Lightweight Graph Denoising Neural Networks


Explore the docs »

View Demo · Report Bug · Request Feature

About The Project

ProtLGN is pre-trained on wild-type proteins for AA-type denoising tasks with equivariant graph neural networks to derive the joint distribution of the recovered AA types (red).

For a protein to mutate, the predicted probabilities suggest the fitness score for associated mutations (blue).

With additional mutation evaluations from wet biochemical assessments, the pre-trained model can be updated to better fit the specific protein and protein functionality (green).

Logo

(back to top)

📄 News

  • [2024.06.06] We recently developed two more advanced protein engineering tools named ProtSSN and ProSST for zero-shot prediction. We recommend you try the new models!

Getting Started

Please follow these simple example steps to get start! 😊

Conda Enviroment

Please make sure you have installed Anaconda3 or Miniconda3.

Enviroment.

conda env create -f environment.yaml
conda activate protlgn
pip install torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.3.0+cu121.html

Pre-train ProtLGN

Step 1: get raw dataset

We use the dataset from CATH 4.2, you can download from https://www.cathdb.info/.

mkdir -p data/cath_k10/raw
cd data/cath_k10/raw
wget https://huggingface.co/datasets/tyang816/cath/blob/main/dompdb.tar
# or wget https://lianglab.sjtu.edu.cn/files/ProtSSN-2024/dompdb.tar
tar -xvf dompdb.tar

Step 2: build graph dataset

see script/build_cath_dataset.sh

Step 3: run pre-train

see run_pretrain.sh

Zero-shot prediction for mutant sequences

You can use your own checkpoint for zero-shot inference.

Step 1: Prepare mutant dataset

Data map:

|—— eval_dataset
|——|—— DATASET
|——|——|—— Protein1
|——|——|——|—— Protein1.tsv (DMS file)
|——|——|——|—— Protein1.pdb (pdb file)
|——|——|——|—— Protein1.fasta (sequence)
|——|——|—— Protein2
|——|——|——|...

see script/build_mutant_dataset.sh

Step 2: Zero-shot

see script/mutant_predict.sh

CUDA_VISIBLE_DEVICES=0 python mutant_predict.py \
    --checkpoint ckpt/ProtLGN.pt \
    --c_alpha_max_neighbors 10 \
    --gnn egnn \
    --use_sasa \
    --layer_num 6 \
    --gnn_config src/Egnnconfig/egnn_mutant.yaml \
    --mutant_dataset data/example

Contributing

Please cite our paper:

@article{zhou2024protlgn,
  title={Protein engineering with lightweight graph denoising neural networks},
  author={Zhou, Bingxin and Zheng, Lirong and Wu, Banghao and Tan, Yang and Lv, Outongyi and Yi, Kai and Fan, Guisheng and Hong, Liang},
  journal={Journal of Chemical Information and Modeling},
  volume={64},
  number={9},
  pages={3650--3661},
  year={2024},
  publisher={ACS Publications}
}

@article{tan2023protssn
  title={Semantical and Topological Protein Encoding Toward Enhanced Bioactivity and Thermostability},
  author={Tan, Yang and Zhou, Bingxin and Zheng, Lirong and Fan, Guisheng and Hong, Liang},
  journal={bioRxiv},
  pages={2023--12},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published