-
Notifications
You must be signed in to change notification settings - Fork 21
Aligning proteins
First, download the DeepBLAST pretrained model
wget https://users.flatironinstitute.org/jmorton/public_www/deepblast-public-data/checkpoints/deepblast-l8.ckpt
You'll also need to download the ProTrans model from huggingface
git lfs install
git clone https://huggingface.co/Rostlab/prot_t5_xl_uniref50
Once those two models are downloaded, you can load the DeepBLAST model as follows
from deepblast.utils import load_model
model = load_model("deepblast-l8.ckpt", "prot_t5_xl_uniref50")
If you have a GPU, you may want to push the model to the GPU via model = model.cuda()
If you are running a CPU version, make sure to run the following command instead.
model = load_model("deepblast-l8.ckpt", "prot_t5_xl_uniref50", device='cpu')
As another note, the load_model
function as an option to allow to specify what type of alignment you want to perform inference using the alignment_mode
option. You can either specify needleman-wunch
for global alignment or smith-waterman
for local alignment.
Once the model is loaded, we can test out DeepBLAST by structurally aligning two proteins using only their sequences
x = 'IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQQFVANVEEEEAWINEKMTLVASED'
y = 'QQNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSDA'
# obtains alignment string specifying structural superposition
pred_alignment = model.align(x, y)
The resulting alignment specifies which residues are aligned. :
indicates matches, 1
indicates residues matched to sequence 1 (aka insertions) and 2
indicates residues matched to sequence 2 (aka deletions). To make this more human readable, we can directly visualize the alignment.
from deepblast.dataset.utils import states2alignment
x_aligned, y_aligned = states2alignment(pred_alignment, x, y)
print(x_aligned)
print(pred_alignment)
print(y_aligned)
Output
-IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQ-QFVANVEEEEAWINEKMTLVASED
21:::::::::::::::::::::::::::::::::::::2::::::::::::::::::::::1:
Q-QNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSD-A