We introduce a novel energy-aware pathfinding algorithm designed to search for the most probable conformational changes in cryo-EM datasets. This approach seeks the shortest pathway on a graph, with edge weights defined as free-energy-like values. Unlike traditional methods that typically operate energy landscape in two or three dimensions (as seen in MEP searches), our algorithm is capable of functioning in higher dimensions. We have tested our method on both synthetic data and the real-world dataset EMPIAR-10076.
Due to storage limitations, image stacks for the synthetic dataset and details on applying our approach to EMPIAR-10076 can be found here.
We developed our approach based on cryoDRGN, a well-known model for analyzing heterogeneity in cryo-EM data. Users can execute our approach in the cryoDRGN environment. Additionally, we utilize cryoDRGN to generate 3D volumes for calculating FSC which is one of our metric to evluate performance.
After successfully installing cryoDRGN, users can navigate to the testing directory and run:
./quicktest.sh
This will spend a few minutes searching for the best pathway on one of our synthetic datasets (Hsp90) with a threshold set by a quantile of 0.2. The results will be saved in the testing directory.
The Jupyter notebook, main.ipynb
, located in the hsp90 directory, details our experiments. We adopted the same workflow for other datasets in our paper. Additionally, the notebook provides more information about how we compared our method to other pathfinding algorithms and the metrics we used.
If you prefer running our approach directly as a Python script, use the command:
python eng_graph_search.py
The inputs are:
- Representation coding (e.g., Latent space from cryoDRGN).
-
Quantile for searching
--search-q
. If a user does not specify, the algorithm will search from 0.1 to 0.9 in increments of 0.1. -
Minimum and maximum zero energy ratio
--zero-rato-l
,--zero-ratio-h
. This range controls the shape of the energy landscape. Refer to our paper for more details. The default range is$(0.01, 0.1)$ . -
Output directory
-o
. - (Optional) Output all paths. To output all paths that reach the range of zero energy ratio, add the
--output-all
flag to the command.
The workflow for generating synthetic datasets can be found at NLRP3 and Hsp90.