Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new workflow scaffold_and_refine_multitaxa #506

Merged
merged 18 commits into from
Feb 5, 2024
Merged

add new workflow scaffold_and_refine_multitaxa #506

merged 18 commits into from
Feb 5, 2024

Conversation

dpark01
Copy link
Member

@dpark01 dpark01 commented Jan 9, 2024

This PR adds a new workflow called scaffold_and_refine_multitaxa which runs scaffold_and_refine on one input sample (contigs + reads) against many reference genomes from different taxa of interest. This is designed to attempt to assemble all taxa of interest for every sample, and will produce partial and empty outputs for all unsuccessful sample x taxon combinations. It is intended for high throughput metagenomic analyses.

This includes a few updates to tasks to make them more resilient to empty fasta inputs/outputs:

  • scaffold
  • run_discordance
  • alignment_metrics

@dpark01 dpark01 marked this pull request as ready for review January 29, 2024 21:15
@dpark01 dpark01 requested a review from tomkinsc January 29, 2024 21:15
String sample_id
File reads_unmapped_bam

Array[Pair[Int,Array[String]+]] taxid_to_ref_accessions = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about HAdV?
There are quite a few, but if we do want to include them:

            # HAdV reference genomes, via https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10509&host=human
            (129875,  ["NC_001460.1"]),  # Human mastadenovirus A strain:Huie; serotype:Human adenovirus 12; culture-collection:ATCC:VR-863
            (108098,  ["NC_011203.1"]),  # Human adenovirus B1
            (108098,  ["NC_011202.1"]),  # Human adenovirus B2
            (129951,  ["NC_001405.1"]),  # Human mastadenovirus C serotype:Human adenovirus 2
            (130310,  ["NC_010956.1"]),  # Human mastadenovirus D strain:Hicks; NIAID V-209-003-014; serotype:Human adenovirus 9
            (130308,  ["NC_003266.2"]),  # Human mastadenovirus E strain:vaccine (CL 68578); serotype:human adenovirus 4
            (130309,  ["NC_001454.1"]),  # Human mastadenovirus F strain:Dugan; serotype:Human adenovirus 40
            (310540,  ["NC_006879.1"]),  # Simian adenovirus 1 strain:ATCC VR-195
            (1123958, ["NC_017825.1"]),  # Chimpanzee adenovirus Y25

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get some proposed edits to this default list from Jillian soon

reference_fasta = scaffold.scaffold_fasta,
sample_name = sample_id
}
# to do: if pre-impute unambig length > some fraction of ref genome, run ncbi.rename_fasta_header and ncbi.align_and_annot_transfer_single
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some fraction of ref genome will be a workflow parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe we try Liftoff for annotation transfer (no chain file required—it does the alignment via minimap2).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's right... this is all obviously to do at some future stage.

pipes/WDL/tasks/tasks_assembly.wdl Show resolved Hide resolved
-f "~{reference_fasta}" "~{reads_aligned_bam}" \
| bcftools call \
-P 0 -m --ploidy 1 \
--threads $(nproc) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reserve a core for the system, writes, etc. ($(nproc --ignore=1))?

@dpark01 dpark01 enabled auto-merge February 5, 2024 19:04
@dpark01 dpark01 added this pull request to the merge queue Feb 5, 2024
Merged via the queue into master with commit 5c5f478 Feb 5, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants