-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(variant index): variant description to summarise variant consequences in transcripts #914
Conversation
…targets/gentropy into ds_3623_variant_description
…targets/gentropy into ds_3623_variant_description
return f.when( | ||
transcript.getField("lofteePrediction").isNotNull() | ||
& (transcript.getField("lofteePrediction") == "HC"), | ||
f.lit(" A high-confidence loss-of-function variant by loftee."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.lit(" A high-confidence loss-of-function variant by loftee."), | |
f.lit(" A high-confidence loss-of-function variant by LOFTEE."), |
Looks amazing. It should be really simple to understand what needs to be changed in the time comes. An opportunity for improvement could be to use the variant effect to improve the description. For example if the max normalised score is > .5 to report the method that predicts the variant to be likely deleterious. An idea for the future. This looks great for now |
f.abs(score) > 0.14, cls._rescaleColumnValue(f.abs(score), 0.14, 1, 0.0, 1) | ||
).when( | ||
f.abs(score) <= 0.14, | ||
cls._rescaleColumnValue(f.abs(score), 0, 0.14, -1, 0.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description and the implementation don't agree. I think the description is better. Either that or: 0-0.5 and 0.5-1. But I wouldn't make it negative as we are not really sure we are talking about benign variation
.when(method == "SIFT", cls._normalise_sift(score, assessment)) | ||
.when(method == "PolyPhen", cls._normalise_polyphen(assessment, score)) | ||
.when(method == "AlphaMissense", cls._normalise_alpha_missense(score)) | ||
.when(method == "phred scaled CADD", cls._normalise_cadd(score)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make it CADD
?
@@ -653,13 +686,17 @@ def process_vep_output( | |||
method_name="phred scaled CADD", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make it CADD?
After reviewing the data and looking at the github repo we should probably leave spliceAI as is. From 0-1. It only assesses splice afecting variants using a recommended threshold of 0.5. But it has predictive value in the whole continuous range. I would not have negatives here |
I was considering this option, but given the normalised assessments are still method specific, it is not clear what to focus, or how to summarise the range of predictors. However, when things are more solidified, we can take a closer look, see how often the methods are consistent or contradicting, and we can sort out something. |
Updates following @d0choa's comments:
|
This PR contains:
1. Adding variant description:
For an easier interpretation of variants, certain key fields from the variant annotation table are combined together to create a
variantDescription
. This is stored and will be shown on the UI. #3623Fields to combine:
biotype
- Shown if not protein coding, description always shows the variant impact on the closest protein coding transcriptmost severe consequence
- always shows the label.approvedSymbol
orgeneId
- always shown for the listed listed transcripts.distanceFromFootprint
- shown if transcripts are not overlapping with the variantaminoAcidChange
- shown if variant causes amino acid change.lofteePrediction
- shown if the variant is high-confidence loss of function predicted by loftee2. New in silico prediction derived from VEP
VEP provides predicted transcript consequences for each overlapping transcripts. These consequences can be ordered by their potential severity. The severity scores are defined and used already by l2g as
consequenceScore
, now the highest score of each variant is captured as a new in silico predictor, where method name is VEP.3. In silico prediction normalisation
To make predicted variant consequences comparable across different methods these predictions are normalised. The normalised values range from -1 to 1, where the negatives indicate benign, positive values deleterious consequences. 3503
4. Ensure to cover all predicted consequences
The UI requires SO codes for variant consequenes. When the mapping was incomplete, the variant page fails. #3624