-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable LiftOverVcf to emit a variant when the reference base has changed #971
Conversation
… reference allele is taken from the reference file and the allele in the vcf is ignored....
…e it should be considered a failure...
- added option to reverse AF-like INFO annotations - added option to drop annotations that become invalid upon swapping (like MAX_AF)
|
||
public static final String SWAPPED_ALLELES = "SwappedAlleles"; | ||
|
||
public static Log log = Log.getInstance(LiftoverUtils.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be final
@@ -192,6 +207,81 @@ protected static GenotypesContext fixGenotypes(final GenotypesContext originals, | |||
} | |||
|
|||
/** | |||
* method to swap the reference and alt alleles of a bi-allelic, SNP | |||
* | |||
* @param vc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide doc text for @param
and @return
or leave them out of the javadoc
final Map<Allele, Allele> alleleMap = new HashMap<>(); | ||
for (int idx = 0; idx < vc.getAlleles().size(); idx++) { | ||
// The index of the allele changed from 1 to 0 and vice versa, but the bases remain the same. | ||
alleleMap.put(vc.getAlleles().get(idx), swappedBuilder.getAlleles().get(1 - idx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO a loop here is less clear than writing out the two put()
calls you'd otherwise have.
swappedBuilder.rmAttribute(key); | ||
} | ||
|
||
if (annotationsToReverse.contains(key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else if
here? Seems like you can't both drop and reverse an annotation.
|
||
if (afLikeAttribute == -1) { | ||
log.warn("Trying to reverse attribute " + key + | ||
"but found value that parsable as a double (" + vc.getAttribute(key, "") + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found value that isn't parsable ?
|
||
if (afLikeAttribute < 0 || afLikeAttribute > 1) { | ||
log.warn("Trying to reverse attribute " + key + | ||
"but found value that isn't between 0 and 1 (" + afLikeAttribute + ") in variant " + vc + "results might be wrong."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both messages seem to be missing a :
between the error message and the incorrect values. E.g. I'd expect to see "Trying to reverse attribute X but found value that isn't between 0 and 1: (X) in variant V. Results might be wrong."
if (afLikeAttribute == -1) { | ||
log.warn("Trying to reverse attribute " + key + | ||
"but found value that parsable as a double (" + vc.getAttribute(key, "") + | ||
") in variant " + vc + "results might be wrong."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a space before results
, also looks like a typo, extra might be wrong.
here and below.
final String refString = StringUtil.bytesToString(referenceSequence.getBases(), start-1, end - start + 1); | ||
//make sure that referenceAllele matches reference | ||
final Allele refAllele = alleles.stream().filter(Allele::isReference).findAny().orElse(null); | ||
if (refAllele == null) throw new RuntimeException("How did that happen? No reference allele?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shorter way to write this is to use orElseThrow()
instead of orElse()
. e.g. alleles.stream().filter(Allele::isReference).findAny().orElseThrow(new RuntimeException("How did that happen? No reference allele?"))
} | ||
|
||
|
||
@Test(dataProvider = "snpWithChangedRef") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix indent, this line and next
builder.source("test3"); | ||
for (start = 2; start <= 11; start++) { | ||
for (start = 2; start <= 10; start++) { | ||
builder.source("test3-"+ start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have a space between the string and the +
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a bunch of comments, but most are cosmetic.
/** | ||
* Attribute used to store the fact that the alt and ref alleles of the variant have been swapped, while all the INFO annotations have not. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this line and maybe update the comment above so that it's applicable to both of the static variables below it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I meant remove the empty line)
|
||
swappedBuilder.attribute(SWAPPED_ALLELES, true); | ||
|
||
// You need to get the baseString in order to forget who's reference and who's variant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"who's" -> "which is"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "forget"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried explaining why I put the getBaseString in the first place...but clearly my comment is unclear...I'll reword.
|
||
final Map<Allele, Allele> alleleMap = new HashMap<>(); | ||
|
||
// The index of the allele changed from 1 to 0 and vice versa, but the bases remain the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is confusing. Would just say "A mapping from old allele to new allele"
} | ||
// Flip AD | ||
final GenotypeBuilder builder = new GenotypeBuilder(genotype).alleles(swappedAlleles); | ||
if (genotype.hasAD() && genotype.getAD().length == 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the length is not 2 then I would invalidate the AD (set to null).
} | ||
|
||
//Flip PL | ||
if (genotype.hasPL() && genotype.getPL().length == 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: If the length is not 3 then I would invalidate the PL.
final String refString = StringUtil.bytesToString(referenceSequence.getBases(), start-1, end - start + 1); | ||
//make sure that referenceAllele matches reference | ||
final Allele refAllele = alleles.stream().filter(Allele::isReference).findAny().orElseThrow(()->new RuntimeException("How did that happen? No reference allele?")); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this check here? It seems more appropriate to put this check where the ref allele was created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. But that really doesn't fit with the flow of the original algorithm as the actual check against the reference happens at the end, before trying to add the VC...if you feel strongly about this, let's talk more in person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly... it's just weird. As a reviewer of the code it just looks out of place.
Can we move it up to the method that calls this?
@@ -161,13 +164,14 @@ | |||
*/ | |||
public static final String ATTEMPTED_LOCUS = "AttemptedLocus"; | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space added
|
||
outHeader.addMetaDataLine(new VCFInfoHeaderLine(LiftoverUtils.SWAPPED_ALLELES, 0, VCFHeaderLineType.Flag, | ||
"The REF and the ALT alleles have been swapped due to changes in the reference. " + | ||
"NOTA BENE: No INFO field annotations reflect this swap, and in the genotypes, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, TAGS_TO_REVERSE do reflect this swap...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right....fixing.
@@ -362,6 +373,13 @@ private void tryToAddVariant(final VariantContext vc, final ReferenceSequence re | |||
final String refString = StringUtil.bytesToString(ref, vc.getStart() - 1, vc.getEnd() - vc.getStart() + 1); | |||
|
|||
if (!refString.equalsIgnoreCase(allele.getBaseString())) { | |||
// consider that the ref and the alt may have been swapped in a simple biallelic SNP | |||
if (vc.getAlleles().size() == 2 && | |||
vc.getAlleles().stream().map(Allele::length).allMatch(l -> l == 1) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird. Why not use isBiallelic() and isSNP()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I was being too technical and forgot about these utility functions....
if (vc.getAlleles().size() == 2 && | ||
vc.getAlleles().stream().map(Allele::length).allMatch(l -> l == 1) && | ||
refString.equalsIgnoreCase(vc.getAlleles().stream().filter(Allele::isNonReference).findFirst().get().getBaseString())) { | ||
sorter.add(LiftoverUtils.swapRefAlt(vc,TAGS_TO_REVERSE, TAGS_TO_DROP)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there should be one more piece of logic here, but will give you the final say in overriding this.
If the original AF was 1.0 then this isn't really an allele swap; rather it's a reference fix and we should just remove the record (since no samples are polymorphic in the new reference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you would like the VC to be dropped if it turns out to be monomorphic after the allele swapping? My sense is that a monomorphic variant is easily removed post-hoc but that it could be useful for some to see them before being removed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so maybe we should keep it in but filter it? Filter could be e.g. ReferenceFix.
Merging this as it was approved by @eitanbanks who only asked for minimal changes (which were done) |
Description
Enables LiftOverVcf to process variants where the reference changed, swapping the reference and the alternate alleles.
This has been implemented for bi-allelic SNPs only.
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review
For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests