expanded parameterization of align_and_count and additional output metrics #525

tomkinsc · 2024-03-09T00:37:14Z

Summary

This PR adds functionality to optionally filter reads after mapping in the align_and_count task, so the counts of mapped reads are comparable to those following filtering during genome assembly. It also adds new numeric outputs relevant for general QC purposes.

New input parameters

The filtering has the following parameters:

filter_bam_to_proper_primary_mapped_reads: enable filtering
- default: false — no filtering is performed
do_not_require_proper_mapped_pairs_when_filtering: do not exclude reads lacking the "proper pair" bit; this is helpful/necessary to set to true when using single-end reads as input if filtering is enabled
- default: false — reads are filtered to proper pairs if filtering is enabled
keep_singletons_when_filtering: singleton reads from paired-end data are kept; this does not affect single-end reads
- default: false — singleton reads are excluded during filtering
keep_duplicates_when_filtering: reads marked as duplicates are kept; this does not supersede exclusion for violations of other criteria
- default: false — duplicate reads are excluded during filtering

New output metrics

This PR also adds new numeric output metrics to align_and_count:

pct_total_reads_mapped: the percent of input reads mapping to any of the input reference sequences
- this is helpful for assessing the fraction of reads in a sample originating from sources corresponding to the reference sequences
pct_lesser_hits_of_mapped: of the reads mapping to reference sequences input to align_and_count, the percent mapping to hits that are not the top hit
- this is helpful for assessing cross-talk between hits

The new outputs are exposed in several of the workflows that have singular outputs from align_and_count. A few other workflows call align_and_count, but output an aggregate report with info from multiple inputs.

Recommended usage

The following values are recommended for most use cases, to count high-quality read mappings with duplicates included.

filter_bam_to_proper_primary_mapped_reads=true
keep_duplicates_when_filtering=true

…in tasks_reports.wdl::align_and_count(); make this the default add functionality to optionally filter reads to include only properly mapped airs in tasks_reports.wdl::align_and_count(); make this the default for align_and_count by setting the task input filter_bam_to_proper_primary_mapped_reads=true.

…additional metrics add keep_duplicates_when_filtering toggle to align_and_count task; also have this task output additional metrics for the percent of mapped reads aligning to hits that are not the top hit, and the percent of total input reads that mapped to any of the align_and_count ref seqs (i.e. how much crosstalk, and how much of the total sample, respectively)

…s are added to existing workflows

pipes/WDL/tasks/tasks_reports.wdl

…ign_and_count require values for the various filtering-related Boolean inputs in align_and_count, since the default values guarantee they'll be set

tomkinsc added 12 commits March 4, 2024 14:15

update version of viral-core viral-core 2.2.4->2.3.0

415bfe4

bump viral-core 2.3.0 -> 2.3.1

9abd348

set viral-core image to 2.3.0

e64db2f

pin viral-core to 2.3.0 in nextstrain tasks too

61942e3

disable align_and_count filtering by default

cdb1af3

add missing close paren

07816fa

actions/checkout v3 -> v4

00d517d

align commands for readability

a4c528b

actions/setup-python v4 -> v5

66b4804

${} to ~{} in align_and_count wdl, minor corrections where new output…

7eaec98

…s are added to existing workflows

dpark01 approved these changes Mar 9, 2024

View reviewed changes

pipes/WDL/tasks/tasks_reports.wdl Outdated Show resolved Hide resolved

require values for the various filtering-related Boolean inputs in al…

78f8fa0

…ign_and_count require values for the various filtering-related Boolean inputs in align_and_count, since the default values guarantee they'll be set

dpark01 added this pull request to the merge queue Mar 11, 2024

Merged via the queue into master with commit 1ce64ab Mar 11, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expanded parameterization of align_and_count and additional output metrics #525

expanded parameterization of align_and_count and additional output metrics #525

tomkinsc commented Mar 9, 2024

expanded parameterization of align_and_count and additional output metrics #525

expanded parameterization of align_and_count and additional output metrics #525

Conversation

tomkinsc commented Mar 9, 2024

Summary

New input parameters

New output metrics

Recommended usage