feat(variant index): variant description to summarise variant consequences in transcripts #914

DSuveges · 2024-11-13T13:43:12Z

This PR contains:

Variant description
New in silico prediction derived from VEP
In silico prediction normalisation
Ensure to cover all predicted consequences
Update GnomAD and VEP variant parser to normalise in silico predictions.

1. Adding variant description:

For an easier interpretation of variants, certain key fields from the variant annotation table are combined together to create a variantDescription. This is stored and will be shown on the UI. #3623

Fields to combine:

biotype - Shown if not protein coding, description always shows the variant impact on the closest protein coding transcript
most severe consequence - always shows the label.
approvedSymbol or geneId - always shown for the listed listed transcripts.
distanceFromFootprint - shown if transcripts are not overlapping with the variant
aminoAcidChange - shown if variant causes amino acid change.
lofteePrediction - shown if the variant is high-confidence loss of function predicted by loftee

2. New in silico prediction derived from VEP

VEP provides predicted transcript consequences for each overlapping transcripts. These consequences can be ordered by their potential severity. The severity scores are defined and used already by l2g as consequenceScore, now the highest score of each variant is captured as a new in silico predictor, where method name is VEP.

3. In silico prediction normalisation

To make predicted variant consequences comparable across different methods these predictions are normalised. The normalised values range from -1 to 1, where the negatives indicate benign, positive values deleterious consequences. 3503

4. Ensure to cover all predicted consequences

The UI requires SO codes for variant consequenes. When the mapping was incomplete, the variant page fails. #3624

…VEP annotation

…targets/gentropy into ds_3623_variant_description

d0choa · 2024-11-18T17:45:49Z

src/gentropy/datasource/ensembl/vep_parser.py

+        return f.when(
+            transcript.getField("lofteePrediction").isNotNull()
+            & (transcript.getField("lofteePrediction") == "HC"),
+            f.lit(" A high-confidence loss-of-function variant by loftee."),


Suggested change

f.lit(" A high-confidence loss-of-function variant by loftee."),

f.lit(" A high-confidence loss-of-function variant by LOFTEE."),

d0choa · 2024-11-18T17:48:13Z

Looks amazing. It should be really simple to understand what needs to be changed in the time comes.

An opportunity for improvement could be to use the variant effect to improve the description. For example if the max normalised score is > .5 to report the method that predicts the variant to be likely deleterious. An idea for the future. This looks great for now

d0choa · 2024-11-18T20:03:00Z

src/gentropy/dataset/variant_index.py

+            f.abs(score) > 0.14, cls._rescaleColumnValue(f.abs(score), 0.14, 1, 0.0, 1)
+        ).when(
+            f.abs(score) <= 0.14,
+            cls._rescaleColumnValue(f.abs(score), 0, 0.14, -1, 0.0),


The description and the implementation don't agree. I think the description is better. Either that or: 0-0.5 and 0.5-1. But I wouldn't make it negative as we are not really sure we are talking about benign variation

d0choa · 2024-11-18T20:20:13Z

src/gentropy/dataset/variant_index.py

+            .when(method == "SIFT", cls._normalise_sift(score, assessment))
+            .when(method == "PolyPhen", cls._normalise_polyphen(assessment, score))
+            .when(method == "AlphaMissense", cls._normalise_alpha_missense(score))
+            .when(method == "phred scaled CADD", cls._normalise_cadd(score))


can we make it CADD?

d0choa · 2024-11-18T20:20:44Z

src/gentropy/datasource/ensembl/vep_parser.py

@@ -653,13 +686,17 @@ def process_vep_output(
                            method_name="phred scaled CADD",


can we make it CADD?

d0choa · 2024-11-18T22:31:33Z

After reviewing the data and looking at the github repo we should probably leave spliceAI as is. From 0-1. It only assesses splice afecting variants using a recommended threshold of 0.5. But it has predictive value in the whole continuous range. I would not have negatives here

DSuveges · 2024-11-19T09:50:26Z

An opportunity for improvement could be to use the variant effect to improve the description. For example if the max normalised score is > .5 to report the method that predicts the variant to be likely deleterious. An idea for the future. This looks great for now

I was considering this option, but given the normalised assessments are still method specific, it is not clear what to focus, or how to summarise the range of predictors. However, when things are more solidified, we can take a closer look, see how often the methods are consistent or contradicting, and we can sort out something.

…targets/gentropy into ds_3623_variant_description

DSuveges · 2024-11-19T11:12:07Z

Updates following @d0choa's comments:

Removed spliceAI normalisation. Score left as is.
Pangolin normalisation updated.
CADD used instead of Phred scaled CADD.
LOFTEE method name capitalised.

DSuveges added 3 commits November 12, 2024 15:14

feat: extending the VEP schema

aee75e3

feat(vep parser): adding logic to build variant description based on …

049fcdc

…VEP annotation

fix: remove commented lines

2d53133

github-actions bot added size-M Feature Datasource labels Nov 13, 2024

DSuveges added 3 commits November 13, 2024 13:43

Merge branch 'dev' into ds_3623_variant_description

83967b6

fix: improving consequence to so term mapping

37fb368

Merge branch 'ds_3623_variant_description' of https://github.com/open…

b992555

…targets/gentropy into ds_3623_variant_description

github-actions bot added the Step label Nov 13, 2024

DSuveges and others added 9 commits November 13, 2024 15:36

fix: nullified variant descriptions

264f187

fix: assessment_flag_column_name type fix

96fcd52

chore: pre-commit auto fixes [...]

3fad3be

feat: adding formatting to distances in description

8208ee6

Merge branch 'ds_3623_variant_description' of https://github.com/open…

9b985e6

…targets/gentropy into ds_3623_variant_description

fix: formatting

a7ddb44

fix: variant index schema

84ca18c

fix: conftest for variant index

a741352

feat(variant index): normalising assessments of in-silico predictors

7654a88

github-actions bot added size-L Dataset and removed size-M labels Nov 15, 2024

DSuveges added 3 commits November 15, 2024 15:47

feat: adding VEP predictor

6f2fd97

fix: variant test config

606f689

Merge branch 'dev' into ds_3623_variant_description

8871af5

DSuveges linked an issue Nov 15, 2024 that may be closed by this pull request

Variant pages are failing if most severe consequence is not available opentargets/issues#3624

Closed

3 tasks

fix: variant test config

432a705

This was linked to issues Nov 15, 2024

Create variant description in the variant index opentargets/issues#3623

Closed

In silico predictions widget color-coding scoping opentargets/issues#3503

Closed

Merge branch 'ds_3623_variant_description' of https://github.com/open…

b7da4e6

…targets/gentropy into ds_3623_variant_description

DSuveges added 2 commits November 15, 2024 16:27

fix: schema type

2471579

fix: dropping failing test

75f139b

DSuveges requested a review from d0choa November 15, 2024 16:44

DSuveges added 2 commits November 18, 2024 14:25

fix: variant annotatin

f063fe8

fix: gnomad variant index repartition

5822fa4

d0choa approved these changes Nov 18, 2024

View reviewed changes

d0choa reviewed Nov 18, 2024

View reviewed changes

d0choa approved these changes Nov 18, 2024

View reviewed changes

DSuveges added 3 commits November 19, 2024 09:51

Merge branch 'dev' into ds_3623_variant_description

2747b54

fix: addressing review comments

6682ac3

Merge branch 'ds_3623_variant_description' of https://github.com/open…

a39c575

…targets/gentropy into ds_3623_variant_description

DSuveges merged commit 9f9cfd6 into dev Nov 19, 2024
5 checks passed

DSuveges deleted the ds_3623_variant_description branch November 19, 2024 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(variant index): variant description to summarise variant consequences in transcripts #914

feat(variant index): variant description to summarise variant consequences in transcripts #914

DSuveges commented Nov 13, 2024 •

edited

Loading

d0choa Nov 18, 2024

d0choa commented Nov 18, 2024

d0choa Nov 18, 2024

d0choa Nov 18, 2024

d0choa Nov 18, 2024

d0choa commented Nov 18, 2024

DSuveges commented Nov 19, 2024

DSuveges commented Nov 19, 2024

	f.lit(" A high-confidence loss-of-function variant by loftee."),
	f.lit(" A high-confidence loss-of-function variant by LOFTEE."),

		@@ -653,13 +686,17 @@ def process_vep_output(
		method_name="phred scaled CADD",

feat(variant index): variant description to summarise variant consequences in transcripts #914

feat(variant index): variant description to summarise variant consequences in transcripts #914

Conversation

DSuveges commented Nov 13, 2024 • edited Loading

This PR contains:

1. Adding variant description:

2. New in silico prediction derived from VEP

3. In silico prediction normalisation

4. Ensure to cover all predicted consequences

d0choa Nov 18, 2024

Choose a reason for hiding this comment

d0choa commented Nov 18, 2024

d0choa Nov 18, 2024

Choose a reason for hiding this comment

d0choa Nov 18, 2024

Choose a reason for hiding this comment

d0choa Nov 18, 2024

Choose a reason for hiding this comment

d0choa commented Nov 18, 2024

DSuveges commented Nov 19, 2024

DSuveges commented Nov 19, 2024

DSuveges commented Nov 13, 2024 •

edited

Loading