-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for M2 filtering #3560
Docs for M2 filtering #3560
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, no need to pass it back
\item \code{maxEventsInHaplotype} is the maximum allowable number of called variants co-occurring in a single assembly region. If the number of called variants exceeds this they will all be filtered. Note that this filter is misnamed because it counts the total number of events over all haplotypes in an assembly region. | ||
\item \code{uniqueAltReadCount} is the minimum number of unique (start position, fragment length) pairs required to make a call. This count is a proxy for the number of unique molecules (as opposed to PCR duplicates) supporting an allele. Normally PCR duplicates are marked and filtered by the GATK engine, but in UMI-aware calling this may not be the case, hence the need for this filter. | ||
\item \code{maxAltAllelesThreshold} is the maximum allowable number of alt alleles at a site. By default only biallelic variants pass the filter. | ||
\item \code{max\_germline\_posterior} is the maximum posterior probability, as determined by the above germline probability model, that a variant is a germline event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just use to lowerCamelCase and not names_with_underscores. Could you also standardize them in M2FiltersArgumentCollection, too? For instance STRAND_ARTIFACT_POSTERIOR_PROB_THRESHOLD
should be strandArtifactPosteriorProbThreshold
and TUMOR_LOD_THRESHOLD
tumorLodThreshold
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but there's discussion about standardizing all GATK argument to eg strand-artifact-posterior-prob-threshold
and I'm waiting to see about that. Personally I think camel case is superior. If HaplotypeCaller doesn't adopt a standard within a few weeks, then let's do camel case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
|
||
\begin{itemize} | ||
\item \code{tumor\_lod} is the minimum likelihood of an allele as determined by the somatic likelihoods model required to pass. | ||
\item \code{maxEventsInHaplotype} is the maximum allowable number of called variants co-occurring in a single assembly region. If the number of called variants exceeds this they will all be filtered. Note that this filter is misnamed because it counts the total number of events over all haplotypes in an assembly region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rename the variable then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I'll do it after the comms team's upcoming tutorial.
|
||
Here for convenience is a table of \code{Mutect2} filters with their corresponding annotations specified by the \code{-A} argument\footnote{Most of these are default annotations and do not need to be invoked explicitly.}, vcf keys for these annotations, and command line arguments controlling filtering thresholds. | ||
|
||
\begin{table}[h!] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we update the variable names we should update the argument column
7173e77
to
b73436a
Compare
Codecov Report
@@ Coverage Diff @@
## master #3560 +/- ##
===========================================
Coverage 79.932% 79.932%
Complexity 17900 17900
===========================================
Files 1199 1199
Lines 65015 65015
Branches 10124 10124
===========================================
Hits 51968 51968
Misses 9014 9014
Partials 4033 4033 |
@takutosato can you review this?