Skip to content

Aligning proteins

Jamie Morton edited this page Mar 8, 2023 · 17 revisions

First, download the DeepBLAST pretrained model

wget https://users.flatironinstitute.org/jmorton/public_www/deepblast-public-data/checkpoints/deepblast-l8.ckpt

It is recommended to download the ProTrans model from huggingface so that you have a local copy of it.

git lfs install
git clone https://huggingface.co/Rostlab/prot_t5_xl_uniref50

If you run the command line version, this is not necessary since the Protrans model will be automatically downloaded by default.

Once those two models are downloaded, you can load the DeepBLAST model.

GPU model loading

from deepblast.utils import load_model
model = load_model("deepblast-l8.ckpt", "prot_t5_xl_uniref50").cuda()

CPU modeling loading

model = load_model("deepblast-l8.ckpt", "prot_t5_xl_uniref50", device='cpu')

As another note, the load_model function as an option to allow to specify what type of alignment you want to perform inference using the alignment_mode option. You can either specify needleman-wunch for global alignment or smith-waterman for local alignment.

Visualizing alignments

Once the model is loaded, we can test out DeepBLAST by structurally aligning two proteins using only their sequences

x = 'IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQQFVANVEEEEAWINEKMTLVASED'
y = 'QQNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSDA'
# obtains alignment string specifying structural superposition
pred_alignment = model.align(x, y)

The resulting alignment specifies which residues are aligned. : indicates matches, 1 indicates residues matched to sequence 1 (aka insertions) and 2 indicates residues matched to sequence 2 (aka deletions). To make this more human readable, we can directly visualize the alignment.

from deepblast.dataset.utils import states2alignment
x_aligned, y_aligned = states2alignment(pred_alignment, x, y)
print(x_aligned)
print(pred_alignment)
print(y_aligned)

Output

-IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQ-QFVANVEEEEAWINEKMTLVASED
21:::::::::::::::::::::::::::::::::::::2::::::::::::::::::::::1:
Q-QNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSD-A
Clone this wiki locally