Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't try to index BAM files that can't be indexed with HTSJDK. #520

Merged
merged 3 commits into from
Aug 19, 2019

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Aug 19, 2019

@nh13 The change here is quite simple. HTSJDK blows up if you ask it to index on the fly and hand it reference sequences that are beyond what BAI can handle. There's no CSI writing support yet either.

I'd also like to wait until samtools/htsjdk#1410 is merged, and update our HTSJDK dependency in this same PR if possible.

@tfenne tfenne requested a review from nh13 August 19, 2019 20:31
@codecov-io
Copy link

codecov-io commented Aug 19, 2019

Codecov Report

Merging #520 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #520      +/-   ##
==========================================
+ Coverage   95.06%   95.06%   +<.01%     
==========================================
  Files         102      102              
  Lines        5770     5774       +4     
  Branches      411      401      -10     
==========================================
+ Hits         5485     5489       +4     
  Misses        285      285
Impacted Files Coverage Δ
.../scala/com/fulcrumgenomics/bam/api/SamWriter.scala 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dc50dd...f7a70d1. Read the comment docs.

val hasChromsTooLong = header.getSequenceDictionary.getSequences.exists(_.getSequenceLength > GenomicIndexUtil.BIN_GENOMIC_SPAN)
val wouldHaveIndexed = header.getSortOrder == SortOrder.coordinate && index
if (wouldHaveIndexed && hasChromsTooLong) logger.warning(s"Cannot index $path as one or more chromosomes is too long.")
wouldHaveIndexed && !hasChromsTooLong
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (header.getSortOrder != SortOrder.coordinate || !index) false else {
  val hasChromsTooLong = header.getSequenceDictionary.getSequences.forall(_.getSequenceLength <= GenomicIndexUtil.BIN_GENOMIC_SPAN)
  if (hasChromsTooLong) logger.warning(s"Cannot index $path as one or more chromosomes is too long.")
  hasChromsTooLong
} 

  

@@ -95,6 +95,7 @@ lazy val commonSettings = Seq(
fork in Test := true,
resolvers += Resolver.sonatypeRepo("public"),
resolvers += Resolver.mavenLocal,
resolvers += "broad-snapshots" at "https://artifactory.broadinstitute.org/artifactory/libs-snapshot/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to get htsjdk snapshots? Sucks, but I understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed for htsjdk snapshots.

@tfenne tfenne merged commit 73d054d into master Aug 19, 2019
@tfenne tfenne deleted the tf_dont_try_index_longseq_bams branch August 19, 2019 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants