-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tool to annotate with pair orientation info #3614
Conversation
@davidbenjamin Please go over the actual annotation code carefully. It seems a bit messy for what it is trying to do. |
Codecov Report
@@ Coverage Diff @@
## master #3614 +/- ##
==============================================
+ Coverage 79.78% 79.789% +0.009%
- Complexity 18237 18297 +60
==============================================
Files 1225 1226 +1
Lines 66983 67113 +130
Branches 10449 10484 +35
==============================================
+ Hits 53439 53549 +110
- Misses 9320 9322 +2
- Partials 4224 4242 +18
|
"\n The output of this tool will only count reads that fully overlap (and match) the variant or reference sequence (this is relevant for indels)." + | ||
"\n IMPORTANT: This tool cannot produce the exact same F1R2/F2R1 counts as you would get from M2, due to the nature of how M2 calls variants (using read likelihoods, wheras this tool uses a base quality filter).", | ||
oneLineSummary = "(EXPERIMENTAL) Annotate a non-M2 VCF (using the associated tumor bam) with pair orientation fields (e.g. " + GATKVCFConstants.F1R2_KEY + " ).", | ||
programGroup = VariantProgramGroup.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, if you want to hide this experimental tool, you can annotated it with omitFromCommandLine = true
in the @CommandLineProgramProperties
, then it wouldn't show up in the help message but still can be used for people knowing its existence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SHuang-Broad , but I think I still want it to show up in the list. I just want users to know that it comes with few quality guarantees.
@@ -290,4 +289,42 @@ public static ReadPileup getPileup(final Locatable loc, final Iterable<GATKRead> | |||
|
|||
return new ReadPileup(loc, pile); | |||
} | |||
|
|||
/** This is lifted directly from htsjdk! However, it is a private method there. | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you open an issue to expose this method? if so, could you link to the issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you @LeeTL1220
import java.util.*; | ||
|
||
@CommandLineProgramProperties( | ||
summary = "(Experimental) This will add fields normally provided by M2 to a VCF. There should never be a need to run this tool on a VCF that was produced by M2." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add -> adds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provided -> emitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@CommandLineProgramProperties( | ||
summary = "(Experimental) This will add fields normally provided by M2 to a VCF. There should never be a need to run this tool on a VCF that was produced by M2." + | ||
"\n The output of this tool should be usable with FilterByOrientationBias." + | ||
"\n The output of this tool will only count reads that fully overlap (and match) the variant or reference sequence (this is relevant for indels)." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will only count -> only counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
summary = "(Experimental) This will add fields normally provided by M2 to a VCF. There should never be a need to run this tool on a VCF that was produced by M2." + | ||
"\n The output of this tool should be usable with FilterByOrientationBias." + | ||
"\n The output of this tool will only count reads that fully overlap (and match) the variant or reference sequence (this is relevant for indels)." + | ||
"\n IMPORTANT: This tool cannot produce the exact same F1R2/F2R1 counts as you would get from M2, due to the nature of how M2 calls variants (using read likelihoods, wheras this tool uses a base quality filter).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This tool does not produce the exact same F1R2/F2R1 counts as M2, "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wheras -> whereas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME) | ||
protected File outputFile; | ||
|
||
public final static String CUTOFF_SHORT_NAME = "c"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote is not to be too aggressive with short names and use "cutoff."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
protected File outputFile; | ||
|
||
public final static String CUTOFF_SHORT_NAME = "c"; | ||
public final static String CUTOFF_LONG_NAME = "meanBaseQualityCutoff"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M2 is about to switch to mean-base-quality-cutoff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
if (pileupElement.isBeforeDeletionStart()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be an else if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return; | ||
} | ||
|
||
if (getMeanBaseQualityForAlleleInRead(pileupElement, pileupAllele) < meanBaseQualityCutoff) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like minimum base quality is a more appropriate criterion than average when deciding if alleles match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
final byte[] alleleBases = allele.getBases(); | ||
final byte[] pileupBaseQualities = ArrayUtils.subarray(pileupElement.getRead().getBaseQualities(), pileupElement.getOffset(), pileupElement.getOffset() + alleleBases.length); | ||
|
||
int totalBaseQuality = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final int totalBaseQuality = MathUtils.sum(pileupBaseQualities);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used min instead of mean.
No action.
} | ||
|
||
private static double getMeanBaseQualityForAlleleInRead(final PileupElement pileupElement, final Allele allele) { | ||
final byte[] alleleBases = allele.getBases(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only use the length, so this could be final int alleleLength = allele.getBases().length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
final static String TEST_BAM_TUMOR_INDELS = TEST_BAM_DIR.getAbsolutePath() + "/tumor_3.bam"; | ||
final static String TEST_BAM_NORMAL_INDELS = TEST_BAM_DIR.getAbsolutePath() + "/normal_3.bam"; | ||
|
||
// TODO: Test with multiallelics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps include some multiallelics where one allele is symbolic?
@davidbenjamin Back to you! |
@jonn-smith This PR will help you as well. You can get variant type and whether the variant is a complex indel. |
Adding testing of f1r2 and f2r1 annotations. Fixed a lot of changes. Also made an optional output required in the WDL Fixed FilterByOrientationBias to accept the case where it only has one allele. That should not happen very often, but it will just do nothing. Made OxoGReadCounts no longer annotate a null readPileup. Beginning Indel support Basic functionality to do the F1R2 annotation of Indels. Added clunky indel support and testing. Fixed indel counts. Answering PR comments. Fleshing out the variant type functionality. Added one more small test.
bf91acf
to
0335bfd
Compare
Looks good. |
This experimental tool allows users to use
FilterByOrientationBias
with VCF calls from tools other than M2. Indels are supported though not thoroughly tested. (Note thatFilterByOrientationBias
does not ever consider indels). This tool requires: