-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-fixes a bug in the liftover logic whereby when the alleles couldn't … #1956
-fixes a bug in the liftover logic whereby when the alleles couldn't … #1956
Conversation
…be extended to the "left" because the start of the variant was already at position 1, the resulting VC was corrupt leading to the problem discussed in !1951 - added a unit test (which fails with the same error as the issue prior to the application of the changes) - _someone_ should add an actual live (with vcf etc....) - fixes !1951 - other tests are failing...need to look at this more carefully.
I think I fixed the bug. I managed to recreate the same cryptic error in a testcase and then changes to the code removed it. Do you mind trying this version on the vcf/chain/reference you have already setup? |
Hi @yfarjoun, thanks for looking into this! I can confirm I just built your branch and ran it on the single problematic variant from the original issue, and the liftover completed successfully. I haven't looked at your code too carefully but see there's a test for this case so it seems this would provide a generalizable fix. If you make this a full PR rather than draft someone (I or someone else) can review and try to get it merged. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Yossi, thanks for this! It seems to fix the issue well. I just made some remarks about updating the comments mostly to help make clear what is happening in the code for the next person to try to debug this tool. I know it's not your fault for the original comments, but since we're here it's probably worth spending a few minutes doing. I highlighted a few key places some extra comments could help a bit, I think. In particular, examples of what sort of variant triggers the blocks would be very useful. Thanks!
VariantContext newVc = builder.make(); | ||
final String refString = StringUtil.bytesToString(REFERENCE_WITH_REPEATS.getBases(), newVc.getStart() - 1, newVc.getEnd() - newVc.getStart() + 1); | ||
|
||
Assert.assertEquals(refString,"CGTC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a space after comma
|
||
changesInAlleles = false; | ||
// 2. if alleles end with the same nucleotide then | ||
if (alleleBasesMap.values().stream() | ||
.collect(Collectors.groupingBy(a -> a[a.length - 1], Collectors.toSet())) | ||
.size() == 1 && theEnd > 1) { | ||
// 3. truncate rightmost nucleotide of each allele |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be turned into a continuation of comment 2? If I understand right, this block just trims the alleles if they end with the same string (e.g. REF = "ACG", ALT="ATG" -> REF = "AC", ALT="AT" ... maybe even put this example in comment?)
@@ -390,20 +390,29 @@ protected static void leftAlignVariant(final VariantContextBuilder builder, fina | |||
.map(a -> a.length) | |||
.anyMatch(l -> l == 0)) { | |||
// 6. extend alleles 1 nucleotide to the left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should also merge with 5. I'm also not sure what it means to have an "empty allele." Can this have an example in comment?
@@ -390,20 +390,29 @@ protected static void leftAlignVariant(final VariantContextBuilder builder, fina | |||
.map(a -> a.length) | |||
.anyMatch(l -> l == 0)) { | |||
// 6. extend alleles 1 nucleotide to the left | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove a newline
extendLeft = false; | ||
theEnd++; | ||
} | ||
|
||
for (final Allele allele : alleleBasesMap.keySet()) { | ||
// the first -1 for zero-base (getBases) versus 1-based (variant position) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments should be updated/repositioned/removed
|
||
// 7. end if | ||
} | ||
changesInAlleles = theStart!=oldStart || theEnd != oldEnd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means repeat the above block of truncating matching right bases, and handling whatever empty alleles mean, whenever there's been a change in the variant. Is a while
block the best way to handle this? Why not just truncate all matching bases on the right in one go, and then handle this other empty edge case as well?
As the comment for the function mentioned, the code follows the paper by G. Abecsis. You can see the pseudo code (including the numbering used here) here: https://genome.sph.umich.edu/wiki/Variant_Normalization I'm happy to make it clearer somehow, but wanted to make sure the context was understood.... :-) |
Ah, good to know. In that case, it makes sense to keep in that format, but maybe this should be emphasized a bit more in the top level comments. There is a reference to the paper, but it's unclear the steps in the comments are those from the paper (at least it was to me). Maybe a sentence or two up top can clarify this. |
thanks for the comments, @rickymagner I think I responded to all of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a lot more context in the comments! This will surely help whomever looks at this code in the future. Looks good for merging, and thanks for handling this bug.
…be extended to the "left" because the start of the variant was already at position 1, the resulting VC was corrupt leading to the problem discussed in !1951
Description
Give your PR a concise yet descriptive title
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review
For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests