Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unmapped reads in Spark. #3369

Merged
merged 1 commit into from
Sep 7, 2017
Merged

Conversation

tomwhite
Copy link
Contributor

Not ready for merge as this relies on HadoopGenomics/Hadoop-BAM#136, hence a new Hadoop-BAM release.

Addresses #2572 and #2571

@tomwhite tomwhite requested a review from droazen July 27, 2017 17:06
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, but I think we should a few additional test cases.

return true;
}
if (traversalParameters.traverseUnmappedReads() && record.getReadUnmappedFlag() && record.getAlignmentStart() == SAMRecord.NO_ALIGNMENT_START) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment for future generations explaining this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{ unmappedBam, null, Arrays.asList(publicTestDir + "org/broadinstitute/hellbender/engine/reads_data_source_test1_unmapped2.intervals"), Arrays.asList("a", "b", "c", "k", "u1", "u2", "u3", "u4", "u5") },
{ ceuSnippet, null, Arrays.asList("unmapped"), Arrays.asList("g", "h", "h", "i", "i") },
{ ceuSnippet, null, Arrays.asList("20:10000009-10000011", "unmapped"), Arrays.asList("a", "b", "c", "d", "e", "g", "h", "h", "i", "i") },
{ ceuSnippet, null, Arrays.asList("20:10000009-10000013", "unmapped"), Arrays.asList("a", "b", "c", "d", "e", "f", "f", "g", "h", "h", "i", "i") },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test case here that is just an interval and not including unmapped, I'm sure it's covered by existing tests, but it would be good to have one here for completeness.

}

@Test(dataProvider = "UnmappedReadInclusionTestData")
public void testUnmappedReadInclusion( final File input, final String reference, final List<String> intervalStrings, final List<String> expectedReadNames ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to make a matching test in ReadSparkSourceUnitTest that directly exercises getParallelReads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lbergelson lbergelson assigned tomwhite and unassigned lbergelson Aug 14, 2017
@tomwhite tomwhite force-pushed the tw_spark_unmapped_reads branch 2 times, most recently from 05b51f7 to 8e90c46 Compare August 15, 2017 16:20
@tomwhite
Copy link
Contributor Author

Thanks for the review @lbergelson. I've addressed all your comments. (Note the tests will still fail until there's a new Hadoop-BAM release.)

@tomwhite tomwhite force-pushed the tw_spark_unmapped_reads branch 3 times, most recently from d63466d to f2a5981 Compare August 29, 2017 14:48
@tomwhite
Copy link
Contributor Author

@lbergelson, @droazen are you OK with adding a SNAPSHOT dependency for Hadoop-BAM so we can commit this (and also the GVCF PR)?

@codecov-io
Copy link

codecov-io commented Aug 29, 2017

Codecov Report

Merging #3369 into master will increase coverage by 0.017%.
The diff coverage is 75%.

@@              Coverage Diff               @@
##              master    #3369       +/-   ##
==============================================
+ Coverage     79.923%   79.94%   +0.017%     
- Complexity     17884    17897       +13     
==============================================
  Files           1198     1198               
  Lines          64966    64980       +14     
  Branches       10114    10120        +6     
==============================================
+ Hits           51923    51945       +22     
+ Misses          9010     9002        -8     
  Partials        4033     4033
Impacted Files Coverage Δ Complexity Δ
...tools/spark/validation/CompareDuplicatesSpark.java 82.927% <50%> (-1.883%) 24 <0> (ø)
...der/engine/spark/datasources/ReadsSparkSource.java 80.198% <76.923%> (+10.724%) 38 <10> (+10) ⬆️
...stitute/hellbender/engine/spark/GATKSparkTool.java 85% <85.714%> (+0.789%) 55 <5> (+2) ⬆️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 78.571% <0%> (+0.649%) 39% <0%> (ø) ⬇️

@tomwhite tomwhite force-pushed the tw_spark_unmapped_reads branch 2 times, most recently from 01816b6 to bffea93 Compare September 6, 2017 16:26
@tomwhite tomwhite force-pushed the tw_spark_unmapped_reads branch from bffea93 to de1a9a1 Compare September 7, 2017 07:36
@tomwhite tomwhite dismissed lbergelson’s stale review September 7, 2017 08:44

Feedback addressed

@tomwhite tomwhite merged commit b06340f into master Sep 7, 2017
@tomwhite tomwhite deleted the tw_spark_unmapped_reads branch September 7, 2017 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants