-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INDEL realigner cleanup #1412
INDEL realigner cleanup #1412
Conversation
Test PASSed. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor doc changes
@@ -77,6 +77,10 @@ class TransformArgs extends Args4jBase with ADAMSaveAnyArgs with ParquetArgs { | |||
var lodThreshold = 5.0 | |||
@Args4jOption(required = false, name = "-max_target_size", usage = "The maximum length of a target region to attempt realigning. Default length is 3000.") | |||
var maxTargetSize = 3000 | |||
@Args4jOption(required = false, name = "-max_reads_per_target", usage = "The maximum number of reads attached to a target that we will attempt realigning. Default length is 20000.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about The maximum number of reads attached to a target considered for realignment. Default is 20000.
@@ -421,20 +435,30 @@ private[read] class RealignIndels( | |||
* @param read Read to test. | |||
* @param reference Reference sequence to sweep across. | |||
* @param qualities Integer sequence of phred scaled base quality scores. | |||
* @param originalQuality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs doc comment
@@ -208,7 +208,9 @@ object MdTag { | |||
|
|||
// dirty dancing to recalculate match sets | |||
for (i <- 0 until cigarElement.getLength) { | |||
if (reference(referencePos) == sequence(readPos)) { | |||
println(reference + " @ " + referencePos + ", " + sequence + " @ " + readPos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
Test FAILed. Build result: FAILURE[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1412/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b8c4222ca4c78a234f74dfd72089d9343e62fa58 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1412/merge^{commit} # timeout=10Checking out Revision b8c4222ca4c78a234f74dfd72089d9343e62fa58 (origin/pr/1412/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b8c4222ca4c78a234f74dfd72089d9343e62fa58First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
e533c90
to
745df4f
Compare
Test PASSed. |
Test FAILed. Build result: FAILURE[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1412/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 6d408c4 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1412/merge^{commit} # timeout=10Checking out Revision 6d408c4 (origin/pr/1412/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 6d408c471459434c79630e5724f1a67c964b88cbFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
ef92e6c
to
8a63091
Compare
Test FAILed. Build result: FAILURE[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1412/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 0761d92 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1412/merge^{commit} # timeout=10Checking out Revision 0761d92 (origin/pr/1412/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 0761d9214029fa634f56bc4a72ea7a61e7d29d71First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. |
Test PASSed. |
8a63091
to
80ad790
Compare
Test PASSed. |
Thanks @heuermh! Glad to have things cleaned up. |
Test PASSed. |
54f328f
to
e6376e8
Compare
Test PASSed. |
Test PASSed. |
Test PASSed. |
Don't unclip. Yes, or do I mean no? |
f82fe65
to
2c868e1
Compare
@heuermh think this is good to go now; rerunning a final verification run. Can you give the second commit a looksee? That should be everything new since your last pass. |
Test PASSed. |
val startClipped = richCigar.softClippedBasesAtStart | ||
val endClipped = richCigar.softClippedBasesAtEnd | ||
|
||
if (unclipReads || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting hard to follow the double negative here. Maybe I need more sleep. :)
builder.setOldPosition(r.getStart()) | ||
builder.setOldCigar(r.getCigar()) | ||
val rec = builder.build() | ||
if (r.getReadName == "H06JUADXX130110:1:1108:12070:36897") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wot?
} else { | ||
r | ||
} | ||
} catch { | ||
case t: Throwable => { | ||
log.warn("Realigning read %s failed with %s.".format(r, t)) | ||
log.warn("Realigning read %s failed with %s. At:".format(r, t)) | ||
val stack = t.getStackTrace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.warn(String, Throwable)
?
* @param remappingIdx The location in the consensus sequence where this | ||
* read has been realigned. | ||
* @param consensus The consensus variant to realign against. | ||
* @peturn Returns a tuple with the (read start, read end, cigar). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peturn
→ @return
} | ||
|
||
// if read is fully before or fully after the consensus, then we | ||
// have a full match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the comments here
Test PASSed. |
Test PASSed. |
Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer. Improve INDEL realigner performance: * Exit early when realigning will not yield a better score. * Eliminate substring call in sweep over reference. * Change datastructures to be immutable wherever possible. * Add bound checking and other error checking. * Rewrite target association code to use array instead of set, and improve load balancing. * Delete high coverage targets with reduceByKey. Additionally: * Improve telemetry/logging to sort out load balance issue. * Support using reference file in INDEL realignment. * Log reads with negative alignment sizes. * Improved test coverage for insertion realignment. * Fix CIGARs on reads that partially overlap INDEL. * Soft clip reads that partially align to an insertion. * Eliminate non-determinism. * Fixed reference file. * Serialization fixes and debug. * Fix bad score. * Clean up clipping code? * Unclip clipped reads.
3efad26
to
b6de9a5
Compare
Squashed down; I say this is good to go from my end. |
Test PASSed. |
Thank you, @fnothaft! |
Fixes #1402, and provides many additional improvements to the IR core. Still a WIP, testing on the cluster as we speak.