Skip to content

Aligning proteins

Jamie Morton edited this page Jan 19, 2023 · 17 revisions

First, download the DeepBLAST pretrained model

wget https://users.flatironinstitute.org/jmorton/public_www/deepblast-public-data/checkpoints/deepblast-l8.ckpt

You'll also need to download the ProTrans model from huggingface

git lfs install
git clone https://huggingface.co/Rostlab/prot_t5_xl_uniref50

Once those two models are downloaded, you can load the DeepBLAST model as follows

from deepblast.utils import load_model
model = load_model("deepblast-l8.ckpt", "prot_t5_xl_uniref50")

If you have a GPU, you may want to push the model to the GPU via model = model.cuda() If you are running a CPU version, make sure to run the following command instead.

model = load_model("deepblast-l8.ckpt", "prot_t5_xl_uniref50", device='cpu')

As another note, the load_model function as an option to allow to specify what type of alignment you want to perform inference using the alignment_mode option. You can either specify needleman-wunch for global alignment or smith-waterman for local alignment.

Once the model is loaded, we can test out DeepBLAST by structurally aligning two proteins using only their sequences

x = 'IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQQFVANVEEEEAWINEKMTLVASED'
y = 'QQNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSDA'
# obtains alignment string specifying structural superposition
pred_alignment = model.align(x, y)

The resulting alignment specifies which residues are aligned. : indicates matches, 1 indicates residues matched to sequence 1 (aka insertions) and 2 indicates residues matched to sequence 2 (aka deletions). To make this more human readable, we can directly visualize the alignment.

from deepblast.dataset.utils import states2alignment
x_aligned, y_aligned = states2alignment(pred_alignment, x, y)
print(x_aligned)
print(pred_alignment)
print(y_aligned)

Output

-IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQ-QFVANVEEEEAWINEKMTLVASED
21:::::::::::::::::::::::::::::::::::::2::::::::::::::::::::::1:
Q-QNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSD-A
Clone this wiki locally