Skip to content

FRET based conformational sampling

Thomas-Otavio Peulen edited this page Jul 17, 2019 · 8 revisions

Overview

ChiSurf can be used to sample the conformational space of proteins in agreement with experimental FRET data. For that, proteins are represented using the dihedral angles of the peptide backbone as internal coordinates. For the peptide backbone, an atomistic representation is used. The peptide backbone is represented by N, C, O, and Cα atoms in addition to the H atoms that bridge the O and the N atoms to form beta-sheets and alpha helices as secondary structure elements. The side-chains are represented by a single centroid that is positioned at the side-chains center of mass.

The proteins conformational space is sampled in a Monte-Carlo (MC) scheme. In the MC scheme, the dihedral angles of the previous conformation are randomly perturbed to yield a conformation for the following MC step. In each MC step, a set of AA is randomly chosen, and their dihedral angles are randomly perturbed. These AAs are chosen at each iteration from a user-specified “move-map”. The move-map specifies for all AA in the peptide chain the probability for being selected in a random perturbation (move). Following a perturbation, the cartesian coordinates of the atoms and the side-chain centroids are calculated. Next, for the new set of dihedral angles, an user-specified energy function is evaluated. The energy function is created by a weighted linear combination of a set of predefined energy functions and may consider statistical potentials, excluded volumes, physics-based potentials, or experimental potentials in a joint energy term. The new value of the energy function, E_{i+1}, is compared to Ei the previous value of the energy function. A move is accepted if E_{i+1} is smaller than E_i or if:

$$exp((E_{struct,i}-E_{struct,i}+1)/kT)$$ > r$$

where r is a random number in the range [0,1), and kT is a scaling factor analogous to temperature.

FRET is a low-resolution technique. Hence, FRET measurements are relatively insensitive to small structural changes. Moreover, the comparison of a structural model to the experimental data is computationally expensive, as, for an accurate comparison, the positional distribution of the fluorophores around their attachment point needs to be simulated. Hence, in ChiSurf, the agreement of the sampled conformations with the FRET experiments is evaluated in a second MC scheme in addition to the MC scheme used to sample the conformational space of the protein. The second MC scheme biases the structural sampling towards configurations that are in agreement with the experiments. The first (structure) MC sampling scheme accumulates structural changes. Next, the second FRET MC evaluates the accumulated changes and accepts these changes if changes improve the agreement with the experimental data or if

$$exp((E_{FRET,i}-E_{FRET,i}+1)/kT)$$ > r$$

where r is a random number in the range [0,1), and kT_FRET is a scaling factor, E_{FRET,i} quantifies the disagreement with the experimental data of the previously accepted structural model, and E_{FRET,i+1} is the disagreement with the experimental data of the proposed structural model.

By default, E_{FRET,i} corresponds to the sum of a error-weighted sum of squared deviations between the experimentally determined distances and the simulated distances. This disagreement, chi^2, is calculated by:

$$chi^2 = 1/(N-1) \sum_i^N ( (d_{exp,i} – d_{sim,i})/(err_{exp_i}^2))^2$$

Where N corresponds to the number of experimental distances d_{exp,i} is an experimental distance with associated experimental error err_{exp,i} and d_{sim,i} correspond to a simulated distance for a structural model.

File formats

Outputs

The structure sampling of ChiSurf uses JSON files to save configurations and parameters. ChiSurf saves MC trajectories as HDF5 files in the MDTraj format (http://mdtraj.org). The values of the FRET bias potential can be saved in ChiSurf as comma-separated files.

Inputs

The experimental FRET constraints are described in the repository of the software Olga [Olga](https://github.com/Fluorescence-Tools/Olga/blob/master/doc/JSON Types%20and%20Parameters.docx).

Below is an example file for the protein GABARAP

{
    "Distances": {
        "7_62": {
            "Forster_radius": 52.0,
            "distance": 35.2,
            "distance_type": "RDAMean",
            "error_neg": 5.0,
            "error_pos": 5.0,
            "position1_name": "7",
            "position2_name": "62"
        },
        "13_62": {
            "Forster_radius": 52.0,
            "distance": 32.1,
            "distance_type": "RDAMean",
            "error_neg": 5.0,
            "error_pos": 5.0,
            "position1_name": "13",
            "position2_name": "62"
        },
        "7_73": {
            "Forster_radius": 52.0,
            "distance": 36.6,
            "distance_type": "RDAMean",
            "error_neg": 5.0,
            "error_pos": 5.0,
            "position1_name": "7",
            "position2_name": "73"
        }		
    },
    "Positions": {
        "7": {
            "atom_name": "CB",
            "chain_identifier": "A",
            "linker_length": 20.0,
            "linker_width": 4.5,
            "radius1": 5.0,
            "radius2": 0.0,
            "radius3": 0.0,
            "residue_name": "GLU",
            "residue_seq_number": 7,
            "simulation_grid_resolution": 0.5,
            "simulation_type": "AV1"
        },
        "13": {
            "atom_name": "CB",
            "chain_identifier": "A",
            "linker_length": 20.0,
            "linker_width": 4.5,
            "radius1": 5.0,
            "radius2": 0.0,
            "radius3": 0.0,
            "residue_name": "LYS",
            "residue_seq_number": 13,
            "simulation_grid_resolution": 0.5,
            "simulation_type": "AV1"
        },
        "62": {
            "atom_name": "CB",
            "chain_identifier": "A",
            "linker_length": 20.0,
            "linker_width": 4.5,
            "radius1": 5.0,
            "radius2": 0.0,
            "radius3": 0.0,
            "residue_name": "PHE",
            "residue_seq_number": 62,
            "simulation_grid_resolution": 0.5,
            "simulation_type": "AV1"
        },
        "73": {
            "atom_name": "CB",
            "chain_identifier": "A",
            "linker_length": 20.0,
            "linker_width": 4.5,
            "radius1": 5.0,
            "radius2": 0.0,
            "radius3": 0.0,
            "residue_name": "GLU",
            "residue_seq_number": 73,
            "simulation_grid_resolution": 0.5,
            "simulation_type": "AV1"
        }
    }
}

Like the input file defining the FRET constraints the simulation parameters are stored in a JSON file.

{
    "av_number_protein_mc": 100,
    "do_av_steepest_descent": false,
    "fps_file": "/home/tpeulen/projects/gabarap/01_data/eTCSPC/distance_labeling_open.fps.json",
    "kt": 1.5,
    "ktAv": 0.5,
    "mc_mode": "av_mc",
    "movemap": [
        1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0,  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,
        1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
        0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
        0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
        0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
    ],
    "number_of_moving_aa": 1,
    "pChi": 0.01,
    "pOmega": 0.0,
    "pPhi": 0.7,
    "pPsi": 0.3,
    "pdb_nOut": 1,
    "potentials": [
        {
            "name": "Clash-Potential",
            "weight": 0.03125
        }
    ],
    "save_filename": "/home/tpeulen/projects/gabarap/02_sampling/open/eTCSPC/run_6.h5",
    "scale": 0.025
}

Above, most important is the parameter 'movemap', which defines the flexible amino acids.

Parameter Meaning Options / example Type
av_number_protein_mc The number of MC steps between the FRET MC steps minimum 1 int
do_av_steepest_descent If true only improvements in FRET are accepted moves true or false boolean
fps_file The JSON file containing the FRET constraints (mandatory of "av_mc" mode) optional for simple MC path and filename of FPS JSON file string
kt Scaling factor for all MC energy terms (mandatory) float
ktAv Scaling factor for all AV energy term (mandatory) float
mc_mode Determined the type of the MC simulation. "simple_mc" means only conformational sampling without FRET MC. "av_mc" biases the simulations towards conformations in agreement with the experimental data. "simple_mc", "av_mc" string
movemap A list containing positive floats. The float numbers are proportional to the probability of choosing an amino-acid and changing its dihedral angle. The sum of the numbers is normalized to unity float in the range [0, inf) list of float
number_of_moving_aa Number of amino acids whose dihedral angle are changed at each MC step. A value of 1 means that one AA is change per MC step, 2 means two AA, etc. (recommended 1) 1 int
pChi Probability the the chi angle (side chain) is changed) 0.1 float
pOmega Probability the the omega angle is changed (recommended 0) 0.0 float
pPhi Probability the the phi angle is changed (recommended 0) 0.7 float
pPsi Probability the the psi angle is changed (recommended 0) 0.3 float
pdb_nOut Only every pbd_nOut MC steps are written out to a file 1 positive int
potentials A list of potentials with corresponding weights See detailed description below list of dictionaries
save_filename The filename to which the trajectory is written. The software write hdf5 files in MDTraj format. "output.h5" string

User interface

S

Preparing the protein structure

  1. The protein structure should be complete without missing loops. Use for instance modeller to fill gaps a protein structure.
  2. ChiSurf expects the structure to be protonated and the hydrogen atoms to be named according to the PARSE force field. Prepare the protein structure with filled loops with PDB2PQR to assure that the naming scheme matches to what ChiSurf expects and save the structure as a PDB file.

Open the prepared protein structure

  1. Open the prepared structure in ChiSurf. Select as an "experiment" the Modelling option. Next, select as "setup" PDB and click on "add data" to open the prepared PDB file.
  2. Select the PDB file in the dataset window and select as model "ProteinMC" and click on "add fit" (see figure below).

ProteinMC fit

  1. Now, either manually prepare the JSON input file and make sure that all the input files correspond to actucal path switch to the "Analysis window" and click on the "load" button next to the Settings label on the top. In case you do not have a template JSON file for your structure, click on the save button next to the load button on the top of the analysis window to save a JSON file.

  2. In the "fitting" window to the right there are two plots: Trajectory plot and MolView. In the trajectory plot there are four subplots that can be used to analyze the simulation. RMSD displays the RMSD of the trajectory with respect to the first structure. dRMSD plots the RMSD with respect to the previous structure. Energy is the total energy of the MC steps. "FRET" displays the energy of the FRET potential. To check if the movemap matches your expectation switch to the MolView plot. The protein will be colored according to the movemap (red fexible, blue rigid). In the plot window you can use the CMD line entry to pass PyMOl commands to the MolView plot.

  3. In the Simulation box of the analysis window click on "start" to begin the sampling. The sampling will run indefintely till you click on stop and will write continously conformations to the harddisk.

  4. Click on the checkbox "RMSD" to calculate the RMSD and dRMSD of the structures. Click on 3D to load structures into the MolView plot. For productive runs

Command line

Two provides a user interface and an

This is achieved using the FRET positioning system (FPS) as a forward model for sampled by simulating fluorescence observables using. and sampling the conformational space of the studied protein.

where the backbone is represented by is represented by an atomistic representation of the backbone is

The internal coordinates are converted to cartesian coordinates to

To evaluate the

Moreover, the protein representation considers
The conformational space is sampled

For that, ChiSurf represents and samples using the the protein backbone. the conformational space.

protein is represented in internal coordinates of the protein.