-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype of JunctionTree based haplotype finding #6034
Conversation
8979ee9
to
d2b4c7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery I'm not at any particular stopping point, just submitting what I have so far.
...r/tools/walkers/haplotypecaller/HaplotypeCallerReadThreadingAssemblerArgumentCollection.java
Outdated
Show resolved
Hide resolved
...itute/hellbender/tools/walkers/haplotypecaller/ReadThreadingAssemblerArgumentCollection.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/BaseGraph.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/BaseGraph.java
Outdated
Show resolved
Hide resolved
...nstitute/hellbender/tools/walkers/haplotypecaller/graphs/GraphBasedKBestHaplotypeFinder.java
Outdated
Show resolved
Hide resolved
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Show resolved
Hide resolved
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
@davidbenjamin @jonn-smith I have pushed the latest version of the code. Most of the changes since this was last shown are adding checks to about every level of the code for infinite loops (several places in the dangling end recovery code, and several places the new BestHaplotypeFinder). Additionally tests have been updated to capture these cases as well as several of the changes to functionality (no longer forcing reference start kmer to have a junction tree, limiting the cases where we actually attempt to follow an unsupported reference path when constructing a haplotype, reintroducing reference path weight to the calculation, etc...). |
// Graph to be operated on, in this case cast as an ExperimentalReadThreadingGraph | ||
ExperimentalReadThreadingGraph experimentalReadThreadingGraph; | ||
|
||
public JunctionTreeKBestHaplotypeFinder(final BaseGraph<V, E> graph, final Set<V> sources, Set<V> sinks, final int branchWeightThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to self: this branchWeightThreshold
in the constructor is currently not actually hooked up, please fix when responding to the comments in the branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a somewhat coherent stopping point. I have finished reviewing haplotype finding and have yet to do graph construction. That is, I reviewed the use of junction trees but not their generation.
src/main/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/BaseGraph.java
Outdated
Show resolved
Hide resolved
...nstitute/hellbender/tools/walkers/haplotypecaller/graphs/GraphBasedKBestHaplotypeFinder.java
Outdated
Show resolved
Hide resolved
*/ | ||
public class JTBestHaplotype<V extends BaseVertex, E extends BaseEdge> extends KBestHaplotype<V, E> { | ||
private JunctionTreeView treesInQueue; // An object for storing and managing operations on the queue of junction trees active for this path | ||
private int decisionEdgesTakenSinceLastJunctionTreeEvidence; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to both of us that we decided to replace this logic with "no revisiting edges without a junction tree telling us to do so"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed this decision into my next branch dependent on this one. It will come with the additional changes made to recover paths dropped by the junction tree traversals. For what its worth it is already implemented there.
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/ExperimentalReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
throw new GATKException("While constructing graph, there was an incongruity between a JunctionTree edge and the edge present on graph traversal"); | ||
} | ||
|
||
// Don't add edges to the symbolic end vertex here at all, thats handled elsewhere, also don't add the same edge again if we pulled it in from a younger tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put an {@ link}
in the comment pointing to the "elsewhere"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats -> that's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, actually, why not handle it here?
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
if (totalOut <= 0) { | ||
treesInQueue.removeEldestTree(); | ||
} else { // Otherwise look at the next tree in the list | ||
currentActiveNodeIndex++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that JunctionTreeView
manages junction tree-based traversal it doesn't seem right to me that you have to keep track of this extra index externally. Is there any way to encapsulate all the state representing traversal inside JunctionTreeView
?
I guess that what I mean is that JunctionTreeView
could encapsulate not just the oldest tree but also the peeking ahead to younger trees above the evidence threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Why don't I make an iterator that manages deleting empty trees and fetching newer trees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work.
...java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/JTBestHaplotype.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second round done. Back to @jamesemery.
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
* @param kmer the query kmer. | ||
* @return {@code true} if we can start thread the sequence at this kmer, {@code false} otherwise. | ||
*/ | ||
protected boolean isThreadingStart(final Kmer kmer, final boolean startThreadingOnlyAtExistingVertex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than stretch the definition of uniqueKmers
and nonUniqueKmers
, I would make this abstract
and just implement it differently between the two child classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I have done the following. I pushed the code for "tracking" kmers down to the graph implementations. This means that there is now a trackKmer
method as well as getNextKmerVertexForChainExtension
where the actual logical differences between the two paths are resolved. Now there is a global kmer map that is owned by the interface but its up to the implementation what gets added to that map and the concept of having non-unique kmers or kmers that are otherwise disqualified is owned by the implementation classes.
I THINK this involves the least code duplication, many of these abstract methods are tiny methods which I suspect could lead to confusion. To that end I am going to ask @lbergelson to take a look at these 3 classes for his opinion once my response has been pushed.
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/ExperimentalReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/ExperimentalReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/ExperimentalReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/ExperimentalReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/ExperimentalReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/ExperimentalReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
* @return {@code true} to indicate that the some sink vertex is reachable by {@code currentVertex}, | ||
* {@code false} otherwise. | ||
*/ | ||
private boolean findGuiltyVerticesAndEdgesToRemoveCycles(final BaseGraph<V, E> graph, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Indeed I would also comment that the vertexes that don't go to sinks can't actually occur at this stage in the code and would have been pruned before. Ultimately my answer to this would probably be to remove them entirely though since it doesn't matter in the new assembler....
@davidbenjamin Responded to most of your comments. |
7c91e9a
to
fe36a8f
Compare
@davidbenjamin Rebased the branch |
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery I have to run for now but I have some initial comments. They're mostly nitpicky. Looks sane to me but there are some things that could be cleaner.
protected boolean startThreadingOnlyAtExistingVertex = false; | ||
protected List<MultiDeBruijnVertex> referencePath = null; | ||
|
||
private int maxMismatchesInDanglingHead = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments. There are a bunch of methods here that could be made private. Make them private. There are a some VisibleForTesting methods that could be made package protected, do it.
Group the abstract methods together so it's clear what's implemented in downstream classes.
import java.util.HashMap; | ||
import java.util.Map; | ||
import java.util.Set; | ||
import java.util.*; | ||
import java.util.regex.Matcher; | ||
import java.util.regex.Pattern; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test class should go into TestUtils instead of the main source packages I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, though its going to involve making a bunch of methods and fields protected rather than package protected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Eh, either way I guess.
final Collection<Kmer> nonUniquesFromSeq = determineNonUniqueKmers(sequenceForKmers, kmerSize); | ||
if ( nonUniquesFromSeq.isEmpty() ) { | ||
// remove this sequence from future consideration | ||
it.remove(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remove looks weird and buggy, the iterator is on a copy of the sequences and as far as I can tell nothing looks at it again so the remove is pointless. If it was supposed to do something it probably isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I think my solution would be to avoid touching that code because it was handed down by our forefathers and I would rather not affect the old graph code wherever possible.
* @return a non-null NonUniqueResult | ||
*/ | ||
private NonUniqueResult determineKmerSizeAndNonUniques(final int minKmerSize, final int maxKmerSize) { | ||
private Set<Kmer> determineNonUniques(final int kmerSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method could be simplified to be static and the kmers as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, though given that its private anyway and will only ever be called from a non-static context being fed all of the sequences i'm not sure what making it static really accomplishes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it more clear that there's something wrong with the method when you do that. It's fine to do nothing here though. I just noticed this method was weird and wanted to make a note of it.
...roadinstitute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraph.java
Outdated
Show resolved
Hide resolved
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
...tute/hellbender/tools/walkers/haplotypecaller/readthreading/ReadThreadingGraphInterface.java
Outdated
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/JunctionTreeLinkedDeBruinGraph.java
Show resolved
Hide resolved
...e/hellbender/tools/walkers/haplotypecaller/readthreading/JunctionTreeLinkedDeBruinGraph.java
Outdated
Show resolved
Hide resolved
@jamesemery Good to go. Suggestions for that argument:
I don't have anything particularly catchy. |
…sequence graph simplifications. This is an experimental feature that is intended to help with ongoing deBrujin graph work and should not be advertized widly
566afdb
to
8a45c50
Compare
Given that this branch has been reviewed and that I would like to get this in so the next round of changes can be pushed into a PR this branch can be merged I think. No guarantees that there aren't lurking issues that might cause this code not to work as-is but the argument to enable this is hidden I'm willing to live with the consequences for now. |
This is a prototype of the basic infrastructure that must go in to make the junction tree based Haplotype finding work. I have pulled out a toggle for the HaplotypeCaller that that enables a separate ReadThreadingAssembler codepath for haplotype finding. Right now when this mode is enabled
ExperimentalReadThreadingAssembler
is used in conjunction withJuncitonTreeKBestHalotypeFinder
to extract only haplotypes that show up in our junction trees with evidence of > 3 reads. This still poses problems with dangling end recovery as definitionally those branches never include complete junction tree data.I will continue to work on this branch (as it is in a somewhat rough state still) but I would like to at least get some eyes on it before i get too deep in the weeds to at least validate the structural approach I have chosen.
Currently known issues in this branch:
Resolves #5925