-
Notifications
You must be signed in to change notification settings - Fork 5
Setup annotation for cladification
For this we need to remove the general KS signal. Requirements:
- fasta id to clade id assignment file.
- Multiple sequence alignment of all KSs in fasta format, where the sequence identifers match the fasta ids match the ones on the assignment file.
python getSeparateAlignmentsPerCladeFromDefinitionFile.py <fasta_id-clade_id-assignment-file> <complete_fasta_alignment> <outputdir> <hmmname>
This creates fasta files on for each clade, with its participants aligned in the same fashion as they were on the general alignment.
python runCreateHMMERsForPrediction.py path-to-dir-with-clade-fastas \\
path-to-complete-alignment consensus-threshold output-path-hmmer-model hmmer-binary-path
This will leave all needed HMMER model for KSs in output-path-hmmer-model
.
Requirements
- Fasta alignment for each desired domain
- A domains definition file, which has the fasta file name and the model name to be used for that fasta, separated by tabulator.
Currently, the following domains are detected by the software:
- ACP
- BR
- C_Domains
- DH
- ER
- GNAT
- KR
- KS_non_transAT
- MT
- OMT
- OX
- PS
- TE
- AT_AH
- AT
- AH
Run:
scripts/hmmerBuildOtherDomains.sh path-to-other-domains-alignment output-path
This bash script should probably be re-written as Python script and be made part of the PKSPredictor-python repo.
Identify the folder where the different chemical building blocks mol files will be available. There should be one file per Clade, and the mol file name needs to match the identifier given to the Clade in the annotation file.
TODO here
The annotation file for the cladification needs to be placed next to the HMMER models for the KS domains, with the same name as the model but with extension .annot_v2.