Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadSparkSource.getHeader silently manufactures a bogus SAMFileHeader for ADAM (and possibly other) files #1280

Open
cmnbroad opened this issue Dec 8, 2015 · 0 comments

Comments

@cmnbroad
Copy link
Collaborator

cmnbroad commented Dec 8, 2015

And probably other kinds of files too. The stack below results from handing it a .ADAM file in the MeanQualityByCycleSparkIntegrationTest.test_ADAM test. The ReadSparkSource code is currently delegating to Hadoop-BAM, which is in turn delegating to promiscuous htsjdk code that says anything that doesn't look like known file type must be a SAM file. It then happily creates a bogus SAMFileHeader from the .ADAM stream.

All 3 layers should probably be more discriminating.

This currently doesn't break any tests. I discovered it when running the HB tests against a local version of htsjdk with a strict setHeader implementation that attempts to resolve all reference names on every setHeader call. That code caused this test to fail because its using the bogus header.

"main@1" prio=5 tid=0x1 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:113)
at htsjdk.samtools.SAMTextReader.readHeader(SAMTextReader.java:200)
at htsjdk.samtools.SAMTextReader.(SAMTextReader.java:63)
at htsjdk.samtools.SAMTextReader.(SAMTextReader.java:73)
at htsjdk.samtools.SAMFileReader.init(SAMFileReader.java:684)
at htsjdk.samtools.SAMFileReader.(SAMFileReader.java:148)
at org.seqdoop.hadoop_bam.util.SAMHeaderReader.readSAMHeaderFrom(SAMHeaderReader.java:66)
at org.seqdoop.hadoop_bam.util.SAMHeaderReader.readSAMHeaderFrom(SAMHeaderReader.java:47)
at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource.getHeader(ReadsSparkSource.java:195)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeReads(GATKSparkTool.java:284)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeToolInputs(GATKSparkTool.java:264)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:255)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:36)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:98)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:146)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:165)
at org.broadinstitute.hellbender.Main.instanceMain(Main.java:66)
at org.broadinstitute.hellbender.Main.instanceMain(Main.java:73)
at org.broadinstitute.hellbender.CommandLineProgramTest.runCommandLine(CommandLineProgramTest.java:68)
at org.broadinstitute.hellbender.tools.spark.pipelines.metrics.MeanQualityByCycleSparkIntegrationTest.test_ADAM(MeanQualityByCycleSparkIntegrationTest.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:639)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:821)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1131)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:124)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108)
at org.testng.TestRunner.privateRun(TestRunner.java:773)
at org.testng.TestRunner.run(TestRunner.java:623)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:357)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:352)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:310)
at org.testng.SuiteRunner.run(SuiteRunner.java:259)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1185)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1110)
at org.testng.TestNG.run(TestNG.java:1018)
at org.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:72)
at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:122)

pjfan pushed a commit that referenced this issue Aug 19, 2016
…, but same issue was present in GATK4) now interval padding works for exclude intervals
pjfan pushed a commit that referenced this issue Aug 19, 2016
…, but same issue was present in GATK4) now interval padding works for exclude intervals
pjfan added a commit that referenced this issue Aug 22, 2016
addresses issue #1280 from GATK3 (issue was originally found in GATK3)
@droazen droazen changed the title ReadSparkSource.getHeader silently manufactures a bogus SAMFileHeader for ADAM files ReadSparkSource.getHeader silently manufactures a bogus SAMFileHeader for ADAM (and possibly other) files Mar 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant