Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tool to annotate with pair orientation info #3614

Merged
merged 1 commit into from
Sep 29, 2017

Conversation

LeeTL1220
Copy link
Contributor

@LeeTL1220 LeeTL1220 commented Sep 26, 2017

This experimental tool allows users to use FilterByOrientationBias with VCF calls from tools other than M2. Indels are supported though not thoroughly tested. (Note that FilterByOrientationBias does not ever consider indels). This tool requires:

  • VCF
  • A bam file for the normal and tumor samples in the VCF. The sample name in the bam files must match the sample names in the VCF.

@LeeTL1220
Copy link
Contributor Author

@davidbenjamin Please go over the actual annotation code carefully. It seems a bit messy for what it is trying to do.

@codecov-io
Copy link

codecov-io commented Sep 26, 2017

Codecov Report

Merging #3614 into master will increase coverage by 0.009%.
The diff coverage is 82.443%.

@@              Coverage Diff               @@
##             master     #3614       +/-   ##
==============================================
+ Coverage     79.78%   79.789%   +0.009%     
- Complexity    18237     18297       +60     
==============================================
  Files          1225      1226        +1     
  Lines         66983     67113      +130     
  Branches      10449     10484       +35     
==============================================
+ Hits          53439     53549      +110     
- Misses         9320      9322        +2     
- Partials       4224      4242       +18
Impacted Files Coverage Δ Complexity Δ
...ntationbiasvariantfilter/OrientationBiasUtils.java 84.956% <0%> (-0.759%) 56 <1> (+1)
...tionbiasvariantfilter/OrientationBiasFilterer.java 93.956% <0%> (-0.549%) 58 <0> (ø)
...bender/utils/GATKProtectedVariantContextUtils.java 71.429% <79.412%> (+2.742%) 47 <20> (+20) ⬆️
...bender/tools/walkers/annotator/OxoGReadCounts.java 84.444% <80.597%> (-11.208%) 49 <30> (+30)
...tute/hellbender/tools/AnnotatePairOrientation.java 96.429% <96.429%> (ø) 8 <8> (?)
...er/tools/spark/sv/discovery/AlignmentInterval.java 88.889% <0%> (+0.463%) 52% <0%> (+1%) ⬆️
...e/hellbender/engine/spark/SparkContextFactory.java 73.973% <0%> (+2.74%) 11% <0%> (ø) ⬇️

"\n The output of this tool will only count reads that fully overlap (and match) the variant or reference sequence (this is relevant for indels)." +
"\n IMPORTANT: This tool cannot produce the exact same F1R2/F2R1 counts as you would get from M2, due to the nature of how M2 calls variants (using read likelihoods, wheras this tool uses a base quality filter).",
oneLineSummary = "(EXPERIMENTAL) Annotate a non-M2 VCF (using the associated tumor bam) with pair orientation fields (e.g. " + GATKVCFConstants.F1R2_KEY + " ).",
programGroup = VariantProgramGroup.class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, if you want to hide this experimental tool, you can annotated it with omitFromCommandLine = true in the @CommandLineProgramProperties, then it wouldn't show up in the help message but still can be used for people knowing its existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SHuang-Broad , but I think I still want it to show up in the list. I just want users to know that it comes with few quality guarantees.

@@ -290,4 +289,42 @@ public static ReadPileup getPileup(final Locatable loc, final Iterable<GATKRead>

return new ReadPileup(loc, pile);
}

/** This is lifted directly from htsjdk! However, it is a private method there.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you open an issue to expose this method? if so, could you link to the issue here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you @LeeTL1220

import java.util.*;

@CommandLineProgramProperties(
summary = "(Experimental) This will add fields normally provided by M2 to a VCF. There should never be a need to run this tool on a VCF that was produced by M2." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add -> adds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provided -> emitted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@CommandLineProgramProperties(
summary = "(Experimental) This will add fields normally provided by M2 to a VCF. There should never be a need to run this tool on a VCF that was produced by M2." +
"\n The output of this tool should be usable with FilterByOrientationBias." +
"\n The output of this tool will only count reads that fully overlap (and match) the variant or reference sequence (this is relevant for indels)." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will only count -> only counts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

summary = "(Experimental) This will add fields normally provided by M2 to a VCF. There should never be a need to run this tool on a VCF that was produced by M2." +
"\n The output of this tool should be usable with FilterByOrientationBias." +
"\n The output of this tool will only count reads that fully overlap (and match) the variant or reference sequence (this is relevant for indels)." +
"\n IMPORTANT: This tool cannot produce the exact same F1R2/F2R1 counts as you would get from M2, due to the nature of how M2 calls variants (using read likelihoods, wheras this tool uses a base quality filter).",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This tool does not produce the exact same F1R2/F2R1 counts as M2, "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wheras -> whereas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME)
protected File outputFile;

public final static String CUTOFF_SHORT_NAME = "c";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is not to be too aggressive with short names and use "cutoff."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

protected File outputFile;

public final static String CUTOFF_SHORT_NAME = "c";
public final static String CUTOFF_LONG_NAME = "meanBaseQualityCutoff";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M2 is about to switch to mean-base-quality-cutoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

if (pileupElement.isBeforeDeletionStart()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be an else if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return;
}

if (getMeanBaseQualityForAlleleInRead(pileupElement, pileupAllele) < meanBaseQualityCutoff) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like minimum base quality is a more appropriate criterion than average when deciding if alleles match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

final byte[] alleleBases = allele.getBases();
final byte[] pileupBaseQualities = ArrayUtils.subarray(pileupElement.getRead().getBaseQualities(), pileupElement.getOffset(), pileupElement.getOffset() + alleleBases.length);

int totalBaseQuality = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final int totalBaseQuality = MathUtils.sum(pileupBaseQualities);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used min instead of mean.

No action.

}

private static double getMeanBaseQualityForAlleleInRead(final PileupElement pileupElement, final Allele allele) {
final byte[] alleleBases = allele.getBases();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only use the length, so this could be final int alleleLength = allele.getBases().length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final static String TEST_BAM_TUMOR_INDELS = TEST_BAM_DIR.getAbsolutePath() + "/tumor_3.bam";
final static String TEST_BAM_NORMAL_INDELS = TEST_BAM_DIR.getAbsolutePath() + "/normal_3.bam";

// TODO: Test with multiallelics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include some multiallelics where one allele is symbolic?

@LeeTL1220
Copy link
Contributor Author

@davidbenjamin Back to you!

@LeeTL1220
Copy link
Contributor Author

@jonn-smith This PR will help you as well. You can get variant type and whether the variant is a complex indel.

Adding testing of f1r2 and f2r1 annotations.

Fixed a lot of changes.  Also made an optional output required in the WDL

Fixed FilterByOrientationBias to accept the case where it only has one allele.  That should not happen very often, but it will just do nothing.  Made OxoGReadCounts no longer annotate a null readPileup.

Beginning Indel support

Basic functionality to do the F1R2 annotation of Indels.

Added clunky indel support and testing.

Fixed indel counts.

Answering PR comments.  Fleshing out the variant type functionality.

Added one more small test.
@LeeTL1220 LeeTL1220 force-pushed the ll_pair_orientation_info branch from bf91acf to 0335bfd Compare September 29, 2017 17:40
@davidbenjamin
Copy link
Contributor

Looks good.

@LeeTL1220 LeeTL1220 merged commit fce3a90 into master Sep 29, 2017
@LeeTL1220 LeeTL1220 deleted the ll_pair_orientation_info branch September 29, 2017 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants