🌟 Join us on Discussions! 🌟
📚 Have a look at our blog post here! 📚
We developed TemBERTure, a deep-learning package for protein thermostability prediction based on amino acids sequences. It consists of three components:
(i) TemBERTureDB, a large curated database of thermophilic and non-thermophilic sequences;
(ii) TemBERTureCLS, a classifier which predicts the thermal class (non-thermophilic or thermophilic) of a protein sequence;
(iii) TemBERTureTm, a regression model, which predicts the melting temperature of a protein, based on its primary sequence.
Both models are built upon the existing protBERT-BFD language model [1] and fine-tuned through an adapter-based approach [2], [3].
This repository provides implementations and weights for both tasks, allowing users to leverage these models for various protein-related predictive tasks.
git clone https://github.com/ibmm-unibe-ch/TemBERTure.git
cd TemBERTure
git filter-branch --subdirectory-filter temBERTure -- --all
Conda:
conda install --file requirements.txt
pip:
pip install -r requirements.txt
i.e.:
seq = 'MEKVYGLIGFPVEHSLSPLMHNDAFARLGIPARYHLFSVEPGQVGAAIAGVRALGIAGVNVTIPHKLAVIPFLDEVDEHARRIGAVNTIINNDGRLIGFNTDGPGYVQALEEEMNITLDGKRILVIGAGGGARGIYFSLLSTAAERIDMANRTVEKAERLVREGEGGRSAYFSLAEAETRLDEYDIIINTTSVGMHPRVEVQPLSLERLRPGVIVSNIIYNPLETKWLKEAKARGARVQNGVGMLVYQGALAFEKWTGQWPDVNRMKQLVIEALRR'
# Initialize TemBERTureCLS model with specified parameters
from temBERTure import TemBERTure
model = TemBERTure(
adapter_path='./temBERTure_CLS/', # Path to the model adapter weights
device='cuda', # Device to run the model on
batch_size=1, # Batch size for inference
task='classification' # Task type (e.g., classification for TemBERTureCLS)
)
In [1]: model.predict(seq)
100%|██████████████████████████| 1/1 [00:00<00:00, 22.27it/s]
Predicted thermal class: Thermophilic
Thermophilicity prediction score: 0.999098474215349
Out[1]: ['Thermophilic', 0.999098474215349]
from temBERTure import TemBERTure
# Initialize all TemBERTureTM replicas with specified inference parameters
model_replica1 = TemBERTure(
adapter_path='./temBERTure_TM/replica1/', # Path to the adapter for replica 1
device='cuda', # Device to run the model on
batch_size=16, # Batch size for inference
task='regression' # Task type (e.g., regression for TemBERTureTM)
)
model_replica2 = TemBERTure(
adapter_path='./temBERTure_TM/replica2/', # Path to the adapter for replica 2
device='cuda', # Device to run the model on
batch_size=16, # Batch size for inference
task='regression' # Task type (e.g., regression for TemBERTureTM)
)
model_replica3 = TemBERTure(
adapter_path='./temBERTure_TM/replica3/', # Path to the adapter for replica 3
device='cuda', # Device to run the model on
batch_size=16, # Batch size for inference
task='regression' # Task type (e.g., regression for TemBERTureTM)
)
The /data
folder in this repository contains the sequences used to generate the different datasets for the project. Furthermore, TemBERTureDB can be found on Zenodo, which also hosts the protein sequences.
If you use TemBERTure, please cite the work.
@article{10.1093/bioadv/vbae103,
author = {Rodella, Chiara and Lazaridi, Symela and Lemmin, Thomas},
title = "{TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms}",
journal = {Bioinformatics Advances},
volume = {4},
number = {1},
pages = {vbae103},
year = {2024},
month = {07},
abstract = "{TemBERTure model and the data are available at: https://github.com/ibmm-unibe-ch/TemBERTure.}",
issn = {2635-0041},
doi = {10.1093/bioadv/vbae103},
url = {https://doi.org/10.1093/bioadv/vbae103},
eprint = {https://academic.oup.com/bioinformaticsadvances/article-pdf/4/1/vbae103/58610069/vbae103.pdf},
}
[1] A. Elnaggar et al., “ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 7112–7127, Oct. 2022, doi: 10.1109/TPAMI.2021.3095381.
[2] N. Houlsby et al., “Parameter-Efficient Transfer Learning for NLP.” arXiv, Jun. 13, 2019. Accessed: Feb. 14, 2024. [Online]. Available: http://arxiv.org/abs/1902.00751
[3] C. Poth et al., “Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning,” 2023, doi: 10.48550/ARXIV.2311.11077.
Thanks to Noah Henrik Kleinschmidt for the TemBERTure logo design.