Support unmapped reads in Spark. #3369

tomwhite · 2017-07-27T17:05:47Z

Not ready for merge as this relies on HadoopGenomics/Hadoop-BAM#136, hence a new Hadoop-BAM release.

Addresses #2572 and #2571

lbergelson

The changes look good to me, but I think we should a few additional test cases.

lbergelson · 2017-08-02T16:00:04Z

src/main/java/org/broadinstitute/hellbender/engine/spark/datasources/ReadsSparkSource.java

            return true;
        }
+        if (traversalParameters.traverseUnmappedReads() && record.getReadUnmappedFlag() && record.getAlignmentStart() == SAMRecord.NO_ALIGNMENT_START) {


Could you add a comment for future generations explaining this case.

lbergelson · 2017-08-02T18:34:58Z

...java/org/broadinstitute/hellbender/tools/spark/pipelines/PrintReadsSparkIntegrationTest.java

+                { unmappedBam, null, Arrays.asList(publicTestDir + "org/broadinstitute/hellbender/engine/reads_data_source_test1_unmapped2.intervals"), Arrays.asList("a", "b", "c", "k", "u1", "u2", "u3", "u4", "u5") },
+                { ceuSnippet, null, Arrays.asList("unmapped"), Arrays.asList("g", "h", "h", "i", "i") },
+                { ceuSnippet, null, Arrays.asList("20:10000009-10000011", "unmapped"), Arrays.asList("a", "b", "c", "d", "e", "g", "h", "h", "i", "i") },
+                { ceuSnippet, null, Arrays.asList("20:10000009-10000013", "unmapped"), Arrays.asList("a", "b", "c", "d", "e", "f", "f", "g", "h", "h", "i", "i") },


We should add a test case here that is just an interval and not including unmapped, I'm sure it's covered by existing tests, but it would be good to have one here for completeness.

lbergelson · 2017-08-02T18:36:39Z

...java/org/broadinstitute/hellbender/tools/spark/pipelines/PrintReadsSparkIntegrationTest.java

+    }
+
+    @Test(dataProvider = "UnmappedReadInclusionTestData")
+    public void testUnmappedReadInclusion( final File input, final String reference, final List<String> intervalStrings, final List<String> expectedReadNames ) {


We might want to make a matching test in ReadSparkSourceUnitTest that directly exercises getParallelReads.

tomwhite · 2017-08-15T16:21:57Z

Thanks for the review @lbergelson. I've addressed all your comments. (Note the tests will still fail until there's a new Hadoop-BAM release.)

tomwhite · 2017-08-29T15:38:58Z

@lbergelson, @droazen are you OK with adding a SNAPSHOT dependency for Hadoop-BAM so we can commit this (and also the GVCF PR)?

codecov-io · 2017-08-29T15:40:05Z

Codecov Report

Merging #3369 into master will increase coverage by 0.017%.
The diff coverage is 75%.

@@              Coverage Diff               @@
##              master    #3369       +/-   ##
==============================================
+ Coverage     79.923%   79.94%   +0.017%     
- Complexity     17884    17897       +13     
==============================================
  Files           1198     1198               
  Lines          64966    64980       +14     
  Branches       10114    10120        +6     
==============================================
+ Hits           51923    51945       +22     
+ Misses          9010     9002        -8     
  Partials        4033     4033

Impacted Files	Coverage Δ	Complexity Δ
...tools/spark/validation/CompareDuplicatesSpark.java	`82.927% <50%> (-1.883%)`	`24 <0> (ø)`
...der/engine/spark/datasources/ReadsSparkSource.java	`80.198% <76.923%> (+10.724%)`	`38 <10> (+10)`	⬆️
...stitute/hellbender/engine/spark/GATKSparkTool.java	`85% <85.714%> (+0.789%)`	`55 <5> (+2)`	⬆️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java	`78.571% <0%> (+0.649%)`	`39% <0%> (ø)`	⬇️

Feedback addressed

tomwhite requested a review from droazen July 27, 2017 17:06

lbergelson previously requested changes Aug 2, 2017

View reviewed changes

droazen assigned lbergelson Aug 9, 2017

lbergelson assigned tomwhite and unassigned lbergelson Aug 14, 2017

tomwhite force-pushed the tw_spark_unmapped_reads branch 2 times, most recently from 05b51f7 to 8e90c46 Compare August 15, 2017 16:20

tomwhite force-pushed the tw_spark_unmapped_reads branch 3 times, most recently from d63466d to f2a5981 Compare August 29, 2017 14:48

tomwhite force-pushed the tw_spark_unmapped_reads branch 2 times, most recently from 01816b6 to bffea93 Compare September 6, 2017 16:26

Support unmapped reads in Spark.

de1a9a1

tomwhite force-pushed the tw_spark_unmapped_reads branch from bffea93 to de1a9a1 Compare September 7, 2017 07:36

tomwhite merged commit b06340f into master Sep 7, 2017

tomwhite deleted the tw_spark_unmapped_reads branch September 7, 2017 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support unmapped reads in Spark. #3369

Support unmapped reads in Spark. #3369

tomwhite commented Jul 27, 2017

lbergelson left a comment

lbergelson Aug 2, 2017

tomwhite Sep 7, 2017

lbergelson Aug 2, 2017

lbergelson Aug 2, 2017

tomwhite Sep 7, 2017

tomwhite commented Aug 15, 2017

tomwhite commented Aug 29, 2017

codecov-io commented Aug 29, 2017 •

edited

Loading

Support unmapped reads in Spark. #3369

Support unmapped reads in Spark. #3369

Conversation

tomwhite commented Jul 27, 2017

lbergelson left a comment

Choose a reason for hiding this comment

lbergelson Aug 2, 2017

Choose a reason for hiding this comment

tomwhite Sep 7, 2017

Choose a reason for hiding this comment

lbergelson Aug 2, 2017

Choose a reason for hiding this comment

lbergelson Aug 2, 2017

Choose a reason for hiding this comment

tomwhite Sep 7, 2017

Choose a reason for hiding this comment

tomwhite commented Aug 15, 2017

tomwhite commented Aug 29, 2017

codecov-io commented Aug 29, 2017 • edited Loading

Codecov Report

codecov-io commented Aug 29, 2017 •

edited

Loading