Skip to content

Setup annotation for cladification

Pablo Moreno edited this page Jan 11, 2017 · 8 revisions

Setup cladification files

Step 1: Create the HMMER models for the KS domains

For this we need to remove the general KS signal. Requirements:

  • fasta id to clade id assignment file.
  • Multiple sequence alignment of all KSs in fasta format, where the sequence identifers match the fasta ids match the ones on the assignment file.
python getSeparateAlignmentsPerCladeFromDefinitionFile.py <fasta_id-clade_id-assignment-file> <complete_fasta_alignment> <outputdir> <hmmname>

This creates fasta files on for each clade, with its participants aligned in the same fashion as they were on the general alignment.

Step 2: Create HMMER models for all KS and individual KS clades

python runCreateHMMERsForPrediction.py path-to-dir-with-clade-fastas \\
path-to-complete-alignment consensus-threshold output-path-hmmer-model hmmer-binary-path

This will leave all needed HMMER model for KSs in output-path-hmmer-model.

Step 3: Create HMMER models for other protein domains

Requirements

  • Fasta alignment for each desired domain
  • A domains definition file, which has the fasta file name and the model name to be used for that fasta, separated by tabulator.

Currently, the following domains are detected by the software:

  • ACP
  • BR
  • C_Domains
  • DH
  • ER
  • GNAT
  • KR
  • KS_non_transAT
  • MT
  • OMT
  • OX
  • PS
  • TE
  • AT_AH
  • AT
  • AH

Run:

scripts/hmmerBuildOtherDomains.sh path-to-other-domains-alignment output-path

This bash script should probably be re-written as Python script and be made part of the PKSPredictor-python repo.

Step 4: KS monomer mol files

Identify the folder where the different chemical building blocks mol files will be available. There should be one file per Clade, and the mol file name needs to match the identifier given to the Clade in the annotation file.

Step 5: Locate mol files for NRPS aa detected

TODO here

Step 6: Place annotation file

The annotation file for the cladification needs to be placed next to the HMMER models for the KS domains, with the same name as the model but with extension .annot_v2.

Step 7: mvn install java part

Step 8: install python part

Step 9: Set environment vars

Step 10: set Java Preference