Author: | Sean Seyler |
---|---|
Year: | 2015 |
License: | GNU Public Licence, version 3 (or higher) |
Copyright: | © 2015 Sean Seyler |
Citation: | Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015). Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput Biol 11 (10): e1004568. doi: 10.1371/journal.pcbi.1004568 |
Path Similarity Analysis (PSA) comprises a computational framework designed to enhance the quantitative comparison of macromolecular transition paths [Seyler2015]. This tutorial provides several examples to demonstrate a comparison, using PSA, of closed to open adenylate kinase (AdK) transition paths generated by a selection of various algorithms [Seyler2014]. Hierarchical clustering is used as a simple, but powerful approach to exploratory data analysis by construction of a heat map-dendrogram representation of the quantitative comparison.
PSA, or PSAnalysis, is based on measuring the geometric similarity of transition paths in configuration space using the Hausdorff and Fréchet path metrics. PSA takes advantage of MDAnalysis [Michaud-Agrawal2011] to provide a seamless interface to Python and NumPy arrays, and a mechanism for performing path comparisons using arbitrary atom selections. MDAnalysis also provides a format-agnostic framework for reading simulation trajectories, allowing rapid comparison of many different computational methods. More information about the method can be found in [Seyler2015].
This tutorial demonstrates a straightforward application of PSA to a set of transitions of the enzyme adenylate kinase (AdK) generated by a selection of methods (for more background on this particular example see [Seyler2014]). Two example python scripts are provided to generate an all-pairs distance comparison between the paths (i.e., all unique pairwise distances): a short version shows how to perform similarity analysis on a set of trajectories that have been pre-processed for proper (frame-by-frame) structural alignment; a full version additionally demonstrates, using the PSA framework, how an alignment procedure would be performed prior to similarity analysis. A third script demonstrates how to perform Hausdorff pairs analyses so that users can explore how paths differ from each other as a function of progress, as well as examine the pair of structures for each pair of paths that are responsible for the Hausdorff distance.
Analyses are performed by executing the psa_short.py
, psa_full.py
, or
psa_hausdorff-pairs.py
python scripts, which automatically read trajectories
from the methods
directory into a PSA object and perform trajectory alignment
(in the case of psa_full.py
). psa_short.py
and psa_full.py
generate
discrete Hausdorff and Fréchet distance matrices, and produce heat
map-dendrograms and annotated heat maps representing the distance matrices after
Ward hierarchical clustering. In psa_hausdorff-pairs.py
, a Hausdorff
pairs (nearest neighbor) analysis is performed, with two plots showing the
nearest neighbor (structures) as a function of (normalized) frame progress for
two pairs of paths (DIMS vs DIMS and DIMS vs rTMD-S).
Also provided are Jupyter notebooks (with the .ipynb
extension) that give
users the option to perform the same analyses as performed by the scripts in an
interactive, step-by-step manner.
The notebooks contain optional analyses (not in the scripts) demonstrating how
to utilize a convenience class called PairID
(provided in pair_id.py
). PairID
provides an intuitive interface to
extract data generated by PSA; the Jupyter notebook called
psa_identifier_example.ipynb demonstrates how it's used. All other notebooks make use
of the PairID
class.
The psa_short.ipynb notebook goes through the basic steps of PSA:
- Prepare and superimpose trajectories appropriately.
- Compute Fréchet or Hausdorff distances between all trajectories and generate a clustered distance matrix.
It uses the same data that were used to prepare the comparison of multiple fast transition path sampling methods shown in Figure 6 in [Seyler2015].
The psa_hausdorff-pairs.ipynb notebook demonstrates how to extract molecular detail from a path comparison: It yields the two frames (one from each trajectory) that are responsible for the largest difference between the two trajectories, as described in more detail in [Seyler2015]. It then shows how to compare the distance between trajectories along a common order parameter.
The scripts can be run directly using, for example,
python psa_short.py
and various settings can be customized, as described below. Furthermore, these scripts can be used as a basis to implement one's own custom analysis.
The user can also try adjusting settings in each file to change, for example, the:
- path metric (default: discrete Fréchet [
discrete_frechet
]) - linkage algorithm for hierarchical clustering (default:
Ward
) - name and location of the plot (default:
df_ward_psa-[short/full].pdf
)
These examples should serve as a sufficient basis for understanding PSA's framework. Some other techniques and analyses using PSA are described in [Seyler2015].
- MDAnalysis: 0.11.1 or higher
- pandas: 0.16.2 or higher
- seaborn: 0.6.0 or higher
If you have questions or problems using the package then ask on the MDAnalysis user mailing list: http://groups.google.com/group/mdnalysis-discussion
This tutorial is still under revision and, although it will be updated to
reflect changes in the MDAnalysis.analysis.psa
module, improvements can
always be made and bugs are likely to be present. Users are encouraged to devise
their own analyses using the PSA framework. Feedback and issues to the tutorial
and PSA are welcome and encouraged!
If you want to write your own code using PSA then use the
MDAnalysis.analysis.psa
module, which is part of MDAnalysis (since release
0.10.0) and have a look at the documentation of the PSA module. This tutorial
requires the PSA implementation in MDAnalysis release 0.11.1 for all features to
work properly.
[Michaud-Agrawal2011] | N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J Comp Chem 32:2319-2327, 2011. doi:10.1002/jcc.21787. http://www.mdanalysis.org |
[Seyler2014] | (1, 2) S.L. Seyler and O. Beckstein, Sampling large conformational transitions: adenylate kinase as a testing ground. Mol Simul 40:855–877, 2014. doi:10.1080/08927022.2014.919497 |
[Seyler2015] | (1, 2, 3, 4, 5) Seyler SL, Kumar A, Thorpe MF, Beckstein O. Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput Biol 11 (10): e1004568, 2015. doi: 10.1371/journal.pcbi.1004568 |