consensus calling major_cutoff knobs and defaults #424
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
major_cutoff
parameter for consensus genome generation (ultimately implemented here) produces IUPAC ambiguity bases in the output assembly at positions where there is significant read support for non-major alleles. The parameter (a value from 0.0 to 1.0) defines the threshold. If the major allele frequency (the frequency of the allele with highest count) is strictly above this threshold, then a single non-ambiguous base will be called at that position. If it is equal to or below the threshold, an appropriate IUPAC ambiguity base will be chosen that represents the possible set of (2 or more) alleles seen in the alignment.major_cutoff
knob to the outer levels of theassemble_refbased
andsarscov2_illumina_full
WDL workflows (they were previously only exposed at the task level).major_cutoff
for both workflows to 0.75 (forsarscov2_illumina_full
since it is mostly governed by contractual requirements, and forassemble_refbased
because it seems to be a more sensible value than the old default of 0.5, which only calls 2-allele ambiguity codes when exact 50-50 read support is observed).