ProtLGN

Protein Engineering with Lightweight Graph Denoising Neural Networks

Explore the docs »

View Demo · Report Bug · Request Feature

About The Project

ProtLGN is pre-trained on wild-type proteins for AA-type denoising tasks with equivariant graph neural networks to derive the joint distribution of the recovered AA types (red).

For a protein to mutate, the predicted probabilities suggest the fitness score for associated mutations (blue).

With additional mutation evaluations from wet biochemical assessments, the pre-trained model can be updated to better fit the specific protein and protein functionality (green).

(back to top)

📄 News

[2024.06.06] We recently developed two more advanced protein engineering tools named ProtSSN and ProSST for zero-shot prediction. We recommend you try the new models!

Getting Started

Please follow these simple example steps to get start! 😊

Conda Enviroment

Please make sure you have installed Anaconda3 or Miniconda3.

Enviroment.

conda env create -f environment.yaml
conda activate protlgn
pip install torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.3.0+cu121.html

Pre-train ProtLGN

Step 1: get raw dataset

We use the dataset from CATH 4.2, you can download from https://www.cathdb.info/.

mkdir -p data/cath_k10/raw
cd data/cath_k10/raw
wget https://huggingface.co/datasets/tyang816/cath/blob/main/dompdb.tar
# or wget https://lianglab.sjtu.edu.cn/files/ProtSSN-2024/dompdb.tar
tar -xvf dompdb.tar

Step 2: build graph dataset

see script/build_cath_dataset.sh

Step 3: run pre-train

see run_pretrain.sh

Zero-shot prediction for mutant sequences

You can use your own checkpoint for zero-shot inference.

Step 1: Prepare mutant dataset

Data map:

|—— eval_dataset
|——|—— DATASET
|——|——|—— Protein1
|——|——|——|—— Protein1.tsv (DMS file)
|——|——|——|—— Protein1.pdb (pdb file)
|——|——|——|—— Protein1.fasta (sequence)
|——|——|—— Protein2
|——|——|——|...

see script/build_mutant_dataset.sh

Step 2: Zero-shot

see script/mutant_predict.sh

CUDA_VISIBLE_DEVICES=0 python mutant_predict.py \
    --checkpoint ckpt/ProtLGN.pt \
    --c_alpha_max_neighbors 10 \
    --gnn egnn \
    --use_sasa \
    --layer_num 6 \
    --gnn_config src/Egnnconfig/egnn_mutant.yaml \
    --mutant_dataset data/example

Contributing

Please cite our paper:

@article{zhou2024protlgn,
  title={Protein engineering with lightweight graph denoising neural networks},
  author={Zhou, Bingxin and Zheng, Lirong and Wu, Banghao and Tan, Yang and Lv, Outongyi and Yi, Kai and Fan, Guisheng and Hong, Liang},
  journal={Journal of Chemical Information and Modeling},
  volume={64},
  number={9},
  pages={3650--3661},
  year={2024},
  publisher={ACS Publications}
}

@article{tan2023protssn
  title={Semantical and Topological Protein Encoding Toward Enhanced Bioactivity and Thermostability},
  author={Tan, Yang and Zhou, Bingxin and Zheng, Lirong and Fan, Guisheng and Hong, Liang},
  journal={bioRxiv},
  pages={2023--12},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ckpt		ckpt
data/example/DATASET		data/example/DATASET
img		img
norm		norm
script		script
src		src
.gitignore		.gitignore
README.md		README.md
data.py		data.py
environment.yaml		environment.yaml
loc_pretrain.py		loc_pretrain.py
model.py		model.py
mutant_predict.py		mutant_predict.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProtLGN

Protein Engineering with Lightweight Graph Denoising Neural Networks

About The Project

📄 News

Getting Started

Conda Enviroment

Pre-train ProtLGN

Step 1: get raw dataset

Step 2: build graph dataset

Step 3: run pre-train

Zero-shot prediction for mutant sequences

Step 1: Prepare mutant dataset

Step 2: Zero-shot

Contributing

License

About

Releases

Packages

Contributors 2

Languages

bzho3923/ProtLGN

Folders and files

Latest commit

History

Repository files navigation

ProtLGN

Protein Engineering with Lightweight Graph Denoising Neural Networks

About The Project

📄 News

Getting Started

Conda Enviroment

Pre-train ProtLGN

Step 1: get raw dataset

Step 2: build graph dataset

Step 3: run pre-train

Zero-shot prediction for mutant sequences

Step 1: Prepare mutant dataset

Step 2: Zero-shot

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages