Skip to content

pvanheus/lukasa

Repository files navigation

A CWL workflow for protein evidence mapping

The workflow in protein_evidence_mapping.cwl uses metaeuk to find regions of overlap between proteins and a genomic contig, then uses samtools faidx to extract those regions and spaln to refine the alignment between the proteins and the genome.

An example workflow input file in in inputs/workflow1_inputs.yml and maps some C. semilaevis proteins from RefSeq against a contig from Lates calcarifer (the Asian seabass). With the correct data in place that can be run e.g. as:

cwltool --parallel protein_evidence_mapping.cwl inputs/workflow1_inputs.yml

This finds 88 mappings between the 39070 RefSeq proteins and the approximately 2.5 MB contig and runs the region extraction and spaln steps using the (experimental) parallel job support of cwltool.

The diagram below shows the workflow logic (drawn using cwlviewer).

Image of workflow logic