Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable LiftOverVcf to emit a variant when the reference base has changed #971

Merged
merged 6 commits into from
Nov 18, 2017

Conversation

yfarjoun
Copy link
Contributor

@yfarjoun yfarjoun commented Nov 8, 2017

Description

Enables LiftOverVcf to process variants where the reference changed, swapping the reference and the alternate alleles.

This has been implemented for bi-allelic SNPs only.

Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

… reference allele is taken from the reference file and the allele in the vcf is ignored....
- added option to reverse AF-like INFO annotations
- added option to drop annotations that become invalid upon swapping (like MAX_AF)
@coveralls
Copy link

coveralls commented Nov 8, 2017

Coverage Status

Coverage decreased (-0.001%) to 77.183% when pulling 2e16d48 on yf_fix_mismatching_allele into 28a441a on master.


public static final String SWAPPED_ALLELES = "SwappedAlleles";

public static Log log = Log.getInstance(LiftoverUtils.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be final

@@ -192,6 +207,81 @@ protected static GenotypesContext fixGenotypes(final GenotypesContext originals,
}

/**
* method to swap the reference and alt alleles of a bi-allelic, SNP
*
* @param vc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide doc text for @param and @return or leave them out of the javadoc

final Map<Allele, Allele> alleleMap = new HashMap<>();
for (int idx = 0; idx < vc.getAlleles().size(); idx++) {
// The index of the allele changed from 1 to 0 and vice versa, but the bases remain the same.
alleleMap.put(vc.getAlleles().get(idx), swappedBuilder.getAlleles().get(1 - idx));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a loop here is less clear than writing out the two put() calls you'd otherwise have.

swappedBuilder.rmAttribute(key);
}

if (annotationsToReverse.contains(key)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if here? Seems like you can't both drop and reverse an annotation.


if (afLikeAttribute == -1) {
log.warn("Trying to reverse attribute " + key +
"but found value that parsable as a double (" + vc.getAttribute(key, "") +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found value that isn't parsable ?


if (afLikeAttribute < 0 || afLikeAttribute > 1) {
log.warn("Trying to reverse attribute " + key +
"but found value that isn't between 0 and 1 (" + afLikeAttribute + ") in variant " + vc + "results might be wrong.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both messages seem to be missing a : between the error message and the incorrect values. E.g. I'd expect to see "Trying to reverse attribute X but found value that isn't between 0 and 1: (X) in variant V. Results might be wrong."

if (afLikeAttribute == -1) {
log.warn("Trying to reverse attribute " + key +
"but found value that parsable as a double (" + vc.getAttribute(key, "") +
") in variant " + vc + "results might be wrong.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a space before results, also looks like a typo, extra might be wrong. here and below.

final String refString = StringUtil.bytesToString(referenceSequence.getBases(), start-1, end - start + 1);
//make sure that referenceAllele matches reference
final Allele refAllele = alleles.stream().filter(Allele::isReference).findAny().orElse(null);
if (refAllele == null) throw new RuntimeException("How did that happen? No reference allele?");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shorter way to write this is to use orElseThrow() instead of orElse(). e.g. alleles.stream().filter(Allele::isReference).findAny().orElseThrow(new RuntimeException("How did that happen? No reference allele?"))

}


@Test(dataProvider = "snpWithChangedRef")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indent, this line and next

builder.source("test3");
for (start = 2; start <= 11; start++) {
for (start = 2; start <= 10; start++) {
builder.source("test3-"+ start);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have a space between the string and the +

@coveralls
Copy link

coveralls commented Nov 8, 2017

Coverage Status

Coverage decreased (-0.005%) to 77.179% when pulling 9fd7512 on yf_fix_mismatching_allele into 28a441a on master.

@yfarjoun yfarjoun requested a review from eitanbanks November 9, 2017 11:17
Copy link
Contributor

@eitanbanks eitanbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a bunch of comments, but most are cosmetic.

/**
* Attribute used to store the fact that the alt and ref alleles of the variant have been swapped, while all the INFO annotations have not.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this line and maybe update the comment above so that it's applicable to both of the static variables below it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I meant remove the empty line)


swappedBuilder.attribute(SWAPPED_ALLELES, true);

// You need to get the baseString in order to forget who's reference and who's variant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"who's" -> "which is"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "forget"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried explaining why I put the getBaseString in the first place...but clearly my comment is unclear...I'll reword.


final Map<Allele, Allele> alleleMap = new HashMap<>();

// The index of the allele changed from 1 to 0 and vice versa, but the bases remain the same.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing. Would just say "A mapping from old allele to new allele"

}
// Flip AD
final GenotypeBuilder builder = new GenotypeBuilder(genotype).alleles(swappedAlleles);
if (genotype.hasAD() && genotype.getAD().length == 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the length is not 2 then I would invalidate the AD (set to null).

}

//Flip PL
if (genotype.hasPL() && genotype.getPL().length == 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: If the length is not 3 then I would invalidate the PL.

final String refString = StringUtil.bytesToString(referenceSequence.getBases(), start-1, end - start + 1);
//make sure that referenceAllele matches reference
final Allele refAllele = alleles.stream().filter(Allele::isReference).findAny().orElseThrow(()->new RuntimeException("How did that happen? No reference allele?"));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this check here? It seems more appropriate to put this check where the ref allele was created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. But that really doesn't fit with the flow of the original algorithm as the actual check against the reference happens at the end, before trying to add the VC...if you feel strongly about this, let's talk more in person.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly... it's just weird. As a reviewer of the code it just looks out of place.
Can we move it up to the method that calls this?

@@ -161,13 +164,14 @@
*/
public static final String ATTEMPTED_LOCUS = "AttemptedLocus";


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space added


outHeader.addMetaDataLine(new VCFInfoHeaderLine(LiftoverUtils.SWAPPED_ALLELES, 0, VCFHeaderLineType.Flag,
"The REF and the ALT alleles have been swapped due to changes in the reference. " +
"NOTA BENE: No INFO field annotations reflect this swap, and in the genotypes, " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, TAGS_TO_REVERSE do reflect this swap...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right....fixing.

@@ -362,6 +373,13 @@ private void tryToAddVariant(final VariantContext vc, final ReferenceSequence re
final String refString = StringUtil.bytesToString(ref, vc.getStart() - 1, vc.getEnd() - vc.getStart() + 1);

if (!refString.equalsIgnoreCase(allele.getBaseString())) {
// consider that the ref and the alt may have been swapped in a simple biallelic SNP
if (vc.getAlleles().size() == 2 &&
vc.getAlleles().stream().map(Allele::length).allMatch(l -> l == 1) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird. Why not use isBiallelic() and isSNP()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I was being too technical and forgot about these utility functions....

if (vc.getAlleles().size() == 2 &&
vc.getAlleles().stream().map(Allele::length).allMatch(l -> l == 1) &&
refString.equalsIgnoreCase(vc.getAlleles().stream().filter(Allele::isNonReference).findFirst().get().getBaseString())) {
sorter.add(LiftoverUtils.swapRefAlt(vc,TAGS_TO_REVERSE, TAGS_TO_DROP));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there should be one more piece of logic here, but will give you the final say in overriding this.
If the original AF was 1.0 then this isn't really an allele swap; rather it's a reference fix and we should just remove the record (since no samples are polymorphic in the new reference).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you would like the VC to be dropped if it turns out to be monomorphic after the allele swapping? My sense is that a monomorphic variant is easily removed post-hoc but that it could be useful for some to see them before being removed...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so maybe we should keep it in but filter it? Filter could be e.g. ReferenceFix.

@coveralls
Copy link

coveralls commented Nov 12, 2017

Coverage Status

Coverage decreased (-0.009%) to 77.175% when pulling 616e23c on yf_fix_mismatching_allele into 28a441a on master.

@yfarjoun yfarjoun changed the title Yf fix mismatching allele Enable LiftOverVcf to emit a variant when the reference base has changed Nov 18, 2017
@yfarjoun
Copy link
Contributor Author

Merging this as it was approved by @eitanbanks who only asked for minimal changes (which were done)

@yfarjoun yfarjoun merged commit c6a75ed into master Nov 18, 2017
@yfarjoun yfarjoun deleted the yf_fix_mismatching_allele branch November 18, 2017 18:50
@yfarjoun yfarjoun restored the yf_fix_mismatching_allele branch August 11, 2020 16:23
@yfarjoun yfarjoun deleted the yf_fix_mismatching_allele branch December 2, 2020 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants