Skip to content

Docking benchmark 5 - cleaned and ready to use for HADDOCK

License

Notifications You must be signed in to change notification settings

haddocking/BM5-clean

Repository files navigation

BM5-clean

Docking benchmark 5 (BM5) - cleaned and ready to use for HADDOCK

DOI This is the docking and binding affinity benchmark described in:

T Vreven, I.H. Moal, A. Vangone, B.G. Pierce, P.L. Kastritis, M. Torchala, R. Chaleil, B. Jimenez-Garcia, P.A. Bates, Juan Fernandez-Recio, A.M.J.J. Bonvin and Z. Weng. Updates to the integrated protein-protein interaction benchmarks: Docking benchmark version 5 and affinity benchmark version 2.
J. Mol. Biol. 19, 3031-3041 (2015). https://doi.org/doi:10.1016/j.jmb.2015.07.016

The repository contains the following information:

HADDOCK-ready

The directory contains HADDOCK-ready files for each entry of BM5. Each sub-directory (one per complex) contains the following files:

  • XXX_r_u.pdb : Unbound receptor PDB
  • XXX_l_u.pdb : Unbound ligand PDB
  • XXX_r_b-matched.pdb : Matched bound receptor PDB
  • XXX_l_b-matched.pdb : Matched bound ligang PDB
  • XXX_r_u_cg.pdb : Martini v2 coarse-grain models for the receptor proteins
  • XXX_l_u_cg.pdb : Martini v2 coarse-grain models for the ligand proteins
  • XXX_reference.pdb: the reference, matched complex

And for each PDB there is an associated .info file providing statistics of the PDB content

Various distance restraints files are present:

  • XXX_ambig.tbl : Ambiguous interaction restraints based on the true interface measured with a 3.9A cutoff
  • XXX_ambig5.tbl: Ambiguous interaction restraints based on the true interface measured with a 5.0A cutoff
  • XXX_restraint-bodies.tbl : If present, contains a list of distance restraints to keep unconnected bodies together
  • XXX_XXX_*.tbl : If present, contains a list of distance restraints to keep the ligand in place in the structure
  • XXX_hbonds.tbl : the combination of the bodies and ligand distance restraints (used in HADDOCK)

And if there is a co-factor or ligand in the structure:

  • XXX_ligand.param : the ligand parameter file as generated by PRODRG
  • XXX_ligand.top : the ligand topology file as generated by PRODRG

Further each sub-directory contains an ana_scripts directory containing analysis scripts:

  • target.pdb : the reference, matched complex
  • target-unbound.pdb : the unbound complex built by superimposing the unbound structures onto the reference complex
  • target.contacts5 : intermolecular contacts at 5A cutoff used for calculating the fraction of native contacts
  • target.izone : the interface definition for i-RMSD calculations with ProFit (derived from all residue contacts at 10A)
  • target.izoneA : same as target.izone but for chainA only
  • target.izoneB : same as target.izone but for chainB only
  • target.lzone : the zone definition for l-RMSD calculations with ProFit
  • i-rmsd_to_xray.csh: csh script to calculate i-RMSDs with ProFit from HADDOCK file.nam files
  • l-rmsd_to_xray.csh: csh script to calculate l-RMSDs with ProFit from HADDOCK file.nam files
  • fraction-native.csh " csh script to calculate the fraction of native contacts from HADDOCK file.nam files
  • cluster-fnat.csh : a script that generate cluster stats including RMSD and Fnat values
  • run_all.csh : a csh script that runs the complete analysis of all three stages of HADDOCK

Note that the paths in the various analysis scripts must be adapted to your directory structure. This can be done by running the scripts/setup-ana_scripts.csh script with as argument the directory name of all entries

Finally, the HADDOCK-ready directory also contains pre-calculated i-RMSD values for the superimposed unbound structures onto the reference complex and for each separate interface:

  • i-RMSD.dat : Interface RMSD unbound superimposed versus reference, sorted in the order of the directory listing
  • i-RMSD-sorted.dat : Interface RMSD unbound superimposed versus reference, sorted from small to large
  • i-RMSD_r.dat : Interface RMSD of the unbound receptor interface versus reference, sorted in the order of the directory listing
  • i-RMSD_r-sorted.dat : Interface RMSD of the unbound receptor interface superimposed versus reference, sorted from small to large
  • i-RMSD_l.dat : Interface RMSD of the unbound ligand interface versus reference, sorted in the order of the directory listing
  • i-RMSD_l-sorted.dat : Interface RMSD of the unbound ligand interface superimposed versus reference, sorted from small to large

Other sub-directories:

  • scripts : directory containing various scripts used for generating restraint files, initial analysis and automation of running HADDOCK. Refer to the README file in that directory for details
  • data : reference run.cns and patch files to setup HADDOCK runs for various scenarios

structures-matched

This directory contains the matched PDB files for all entries of the benchmark. All structures (bound or unbound) consist of a unique chain (A for the receptor, B for the ligand) with non overlapping numbering.

The bound forms have been matched to the unbound, meaning that they have the same residue numbering and only contain residues matching residues in the unbound forms.

For each entry XXX the following files are present:

  • XXX_r_u.pdb : Unbound receptor PDB
  • XXX_l_u.pdb : Unbound ligand PDB
  • XXX_r_b-matched.pdb : Matched bound receptor PDB
  • XXX_l_b-matched.pdb : Matched bound ligang PDB

And for each PDB there is an associated .info file providing statistics of the PDB content

structures-orig

This directory contains the original PDB files for all entries of the benchmark as downloaded from https://zlab.umassmed.edu/benchmark/

And for each PDB there is an associated .info file providing statistics of the PDB content

scripts

A few basic csh scripts used to prepare the matched PDBs.

Manual intervention and checking was however required in several instances