Skip to content

AF2 based MCMC Hallucination Script based on Wicky, Milles, Courbet et al 2022

Notifications You must be signed in to change notification settings

dmitropher/af2_multistate_hallucination

Repository files navigation

Multi-state design using AlphaFold2 MCMC hallucination.

Started by bwicky@uw.edu on 2021-08-11

Re-factored and merged with code from lmilles@uw.edu on 2021-08-23

Summary

  • Designs (hallucinations) are performed by MCMC searches in sequence space and optimizing (user-defined) losses composed of AlphaFold2 metrics, and/or geometric constraints, and/or secondary-structure definitions.
  • Oligomers with arbitrary number of subunits can be designed.
  • Multistate design (either positive or negative) can be specified.
  • MCMC trajectories can either be seeded with input sequence(s), or started randomly (using a frequency-adjusted AA distribution).
  • At each step, position(s) are chosen for mutation based on different options (see modules/seq_mutation.py for details).
  • A 'resfile' (.af2h extension) can be employed to specify designable positions and associated probabilities of mutation.
  • The oligomeric state (number of subunits) for each oligomer (state) can be specified.
  • Repeat proteins (sequence-symmetric monomers) can be designed instead of oligomers by passing the --single_chains flag.
  • Specific amino acids can be exluded.
  • MCMC paramters (initial temperature, annealing half-life, steps, tolerance) can be specified.
  • Currently implemented loss functions are (see modules/losses.py for details):
    • plddt: plDDT seem to have trouble converging to complex formation.
    • ptm: pTM tends to 'melt' input structures.
    • pae: similar to result as ptm?
    • dual: combination of plddt and ptm losses with equal weights.
    • entropy: current implementation unlikely to work.
    • pae_sub_mat: initially made to enforce symmetry, but probably not working.
    • pae_asym: this loss has different weights associated with the means of the different PAE sub-matrices (asymmetric weighting of the different inter-chain contacts). Off-diagonal elements (+/-1 from the diagaonl, and opposite corners) have higher weights.
    • cyclic: new trial loss to enforce symmetry based on pae sub-matrix sampling. Not sure it is working. Needs to be benchmarked.
    • dual_cyclic: dual with an added geometric loss term to enforce symmetry. Seems to work well.
    • dual_dssp: jointly optimises ptm and plddt (equal weights) as well as enforcing a specific secondary structure content as computed by DSSP on the structure.
    • tmalign: loss defined as TM-score to template PDB, given with --template , alignment of template (tmalign -I) can be forced with --template_alignment [alignment].aln if template has multiple chains remove the TER in the pdbfiles.
    • dual_tmalign: jointly optimises ptm, plddt and tmalign (see above) TM-score.
    • pae_asym_tmalign: in development.
    • aspect_ratio: geometric term that enforces protomers with aspect ratios close to 1 (i.e. spherical).

Minimal inputs

  • The number and type of subunits for each oligomer, also indicating whether it is a positive or negative design task.
  • The length of each protomer or one seed sequence per protomer.

Examples

  • ./AF2_multistate_hallucination.py --oligo AAAA+,AB+ --L 50,50 will perform 2-state positive design, concomently optimising for a homo-tetramer and a hetero-dimer.
  • ./AF2_multistate_hallucination.py --oligo ABC+, --L 40,50,60 will perform single-state design of a hetero-trimer with protomers of different lengths.
  • ./AF2_multistate_hallucination.py --oligo AB+,AA-,BB- --L 50,50 will perform multi-state design concomently optimising for the heterodimer and disfavouring the two homo-dimers.
  • ./AF2_multistate_hallucination.py --oligo AAAAAA+ --L 30 --single_chains will perform single-state design of a monomeric repeat proteins containing six repeats, each 30 amino-acids in length.

Example .af2h file

The following config file enables design at all positions set to 1 (equal probability of picking those sites for mutation), and disallow design at all positions that are set to 0.

>A
DEEQEKAEEWLKEAEEMLEQAKRAKDEEELLKLLVRLLELSVELAKIIQKTKDEEKKKELLEINKRLIEVIKELLRRLK
1,1,1,1,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,1,1,1,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,0,0,1,1,0,1,1,0,0,1,1,0,1,1,0,0,1
>B
QEELAELIELILEVNEWLQRWEEEGLKDSEELVKEYEKIVEKIKELVKMAEEGHDEEEAEEEAKKLKKKAEEILREAEKG
1,1,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,1,1,0,1,1,0,1,1,0,0,1,1,0,1,1,0,0,1,0,0,1,1,0,0,1,0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,1,1,0

Example template alignment for tmalign loss

Remove the TER in the template pdbfiles. model1 (do not change the names) ist the template given in --template model2 should be the length of the protomer to be designed (sequence given here is irrelevant) e.g. for a desing of length 130 with motifs placed at N- and C-termini Do not change this order!

>model1
RSMSWDNEVAFN-----------------------------------------------------
----------------------------------------------------QHHLGGAKQAGAV

>model2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Outputs

  • PDB structures for each accepted move of the MCMC trajectory.
  • A file (.out) containing the scores at each step of the MCMC trajectory (accepted and rejected).

To-do

  • A CCE-based loss to enable constrained hallucination based on an input structure?
  • Check if normalising pae and pae-derived losses by their init value is an appropriate scaling method?

About

AF2 based MCMC Hallucination Script based on Wicky, Milles, Courbet et al 2022

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published