Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(variant index): variant description to summarise variant consequences in transcripts #914

Merged
merged 27 commits into from
Nov 19, 2024

Conversation

DSuveges
Copy link
Contributor

@DSuveges DSuveges commented Nov 13, 2024

This PR contains:

  1. Variant description
  2. New in silico prediction derived from VEP
  3. In silico prediction normalisation
  4. Ensure to cover all predicted consequences
  5. Update GnomAD and VEP variant parser to normalise in silico predictions.

1. Adding variant description:

For an easier interpretation of variants, certain key fields from the variant annotation table are combined together to create a variantDescription. This is stored and will be shown on the UI. #3623

Fields to combine:

  • biotype - Shown if not protein coding, description always shows the variant impact on the closest protein coding transcript
  • most severe consequence - always shows the label.
  • approvedSymbol or geneId - always shown for the listed listed transcripts.
  • distanceFromFootprint - shown if transcripts are not overlapping with the variant
  • aminoAcidChange - shown if variant causes amino acid change.
  • lofteePrediction - shown if the variant is high-confidence loss of function predicted by loftee

2. New in silico prediction derived from VEP

VEP provides predicted transcript consequences for each overlapping transcripts. These consequences can be ordered by their potential severity. The severity scores are defined and used already by l2g as consequenceScore, now the highest score of each variant is captured as a new in silico predictor, where method name is VEP.

3. In silico prediction normalisation

To make predicted variant consequences comparable across different methods these predictions are normalised. The normalised values range from -1 to 1, where the negatives indicate benign, positive values deleterious consequences. 3503

4. Ensure to cover all predicted consequences

The UI requires SO codes for variant consequenes. When the mapping was incomplete, the variant page fails. #3624

@github-actions github-actions bot added the Step label Nov 13, 2024
@DSuveges DSuveges requested a review from d0choa November 15, 2024 16:44
return f.when(
transcript.getField("lofteePrediction").isNotNull()
& (transcript.getField("lofteePrediction") == "HC"),
f.lit(" A high-confidence loss-of-function variant by loftee."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f.lit(" A high-confidence loss-of-function variant by loftee."),
f.lit(" A high-confidence loss-of-function variant by LOFTEE."),

@d0choa
Copy link
Collaborator

d0choa commented Nov 18, 2024

Looks amazing. It should be really simple to understand what needs to be changed in the time comes.

An opportunity for improvement could be to use the variant effect to improve the description. For example if the max normalised score is > .5 to report the method that predicts the variant to be likely deleterious. An idea for the future. This looks great for now

f.abs(score) > 0.14, cls._rescaleColumnValue(f.abs(score), 0.14, 1, 0.0, 1)
).when(
f.abs(score) <= 0.14,
cls._rescaleColumnValue(f.abs(score), 0, 0.14, -1, 0.0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description and the implementation don't agree. I think the description is better. Either that or: 0-0.5 and 0.5-1. But I wouldn't make it negative as we are not really sure we are talking about benign variation

.when(method == "SIFT", cls._normalise_sift(score, assessment))
.when(method == "PolyPhen", cls._normalise_polyphen(assessment, score))
.when(method == "AlphaMissense", cls._normalise_alpha_missense(score))
.when(method == "phred scaled CADD", cls._normalise_cadd(score))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make it CADD?

@@ -653,13 +686,17 @@ def process_vep_output(
method_name="phred scaled CADD",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make it CADD?

@d0choa
Copy link
Collaborator

d0choa commented Nov 18, 2024

After reviewing the data and looking at the github repo we should probably leave spliceAI as is. From 0-1. It only assesses splice afecting variants using a recommended threshold of 0.5. But it has predictive value in the whole continuous range. I would not have negatives here

@DSuveges
Copy link
Contributor Author

An opportunity for improvement could be to use the variant effect to improve the description. For example if the max normalised score is > .5 to report the method that predicts the variant to be likely deleterious. An idea for the future. This looks great for now

I was considering this option, but given the normalised assessments are still method specific, it is not clear what to focus, or how to summarise the range of predictors. However, when things are more solidified, we can take a closer look, see how often the methods are consistent or contradicting, and we can sort out something.

@DSuveges
Copy link
Contributor Author

Updates following @d0choa's comments:

  • Removed spliceAI normalisation. Score left as is.
  • Pangolin normalisation updated.
  • CADD used instead of Phred scaled CADD.
  • LOFTEE method name capitalised.

@DSuveges DSuveges merged commit 9f9cfd6 into dev Nov 19, 2024
5 checks passed
@DSuveges DSuveges deleted the ds_3623_variant_description branch November 19, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants