Linear Combinations of Template Conformations (LCTC): An Efficient Method to Quantify Structural Distributions in Heterogeneous Cryo-EM Datasets
Flexible biomolecules exist as an ensemble of conformations in solution that have functional importance. Cryo-EM can detect protein conformations frozen in solution and thus provide a promising way to characterize these conformational changes. However, it remains challenging for existing software to elucidate multiple conformations and their distributions from a Cryo-EM dataset. Analogous to the idea of constructing molecular orbitals via the linear combination of atomic orbitals (“basis set”) to the orbital electronic wave functions, we developed a new algorithm: Linear Combinations of Template Conformations (LCTC) to obtain multiple conformations and their populations from Cryo-EM datasets. Different from the widely used Clustering-based or MaximumLikelihood-based methods in Cryo-EM studies, LCTC assigns 2D images to the template 3D structures (“basis set”) obtained by Multi-body Refinement of RELION via a novel two-stage matching algorithm. The key advantage of our algorithm lies in an initial rapid assignment of experimental 2D images to template 2D images based on auto-correlation functions of image contours. This first-stage matching process can efficiently identify a subset of experimental 2D images close to template images to remove the majority of irrelevant experimental 2D images. This enables a subsequently accurate, but more expensive pixel-pixel matching of images with a fewer number of experimental 2D images.
Our scheme is composed of four steps: 1) The best viewing angle to distinguish conformational changes is identified. 2) Template 3D structures generated by Multi-body Refinement of RELION are projected onto a number of viewing angles in proximity to the best viewing angle to generate template 2D images. 3) Cryo-EM 2D images are assigned to template 2D images via a two-stage matching process, in which the first computation of the pairwise distance was based on auto-correlation functions of the contours of masked images. Then comparison is performed on the pairwise distance based on pixel-pixel matching. 4) The populations of template structures are obtained.
python, linux system, relion and xmipp
Each dataset is consisted of template structures and experimental structures. We use two expmples to test our algorithm, one is simulated dataset: Taq RNA Polymerase (RNAP), the second is real dataset: E coli RNAP.
Two simulated dataset are provided: open.vol, close.vol, intermediate.vol(same as test data and templates) and real dataset:6P1K.vol(test data), H_EV2_red.vol, H_EV2_grey.vol, H_EV2_blue.vol(templates), templates are generated by multi-body refinement of Relion.
Preprocess: you can use command:
xmipp_xmipp_volume_from_pdb -i open.pdb -o open.vol
transfer pdb file to vol file
xmipp_image_convert -i open.vol -o open.mrc
transfer vol file to mrc file
run our algorithm: python TSTM.py --datatype='sim' --vol_size=128
, the output is '2nd_stage_brute_force_classification_result.dat'
and python ./two_stage_matching/analyze_population.py
to obatin populations for analysis.