feat: implement posterior prob filter for COLOC at small overlaps N<10 #977

xyg123 · 2025-01-22T14:55:40Z

✨ Context

There is instability of COLOC at small overlaps due to inflation of the likelihood term calculation, the problem is overviewed in the issue.

🛠 What does this PR implement

We now only pass small overlaps to COLOC if the overlapping probabilities are > 0.9 on both sides, which means the inflated H4s will be likely to be true colocalisation results. The parameters chosen here, PP > 0.9 and N <10 to achieve a comparable proportion of significant h4 results with the prev. OTG portal.

	Total Colocalisation Results	Number of COLOC H4 > 0.8	Percentage of COLOC H4 > 0.8
Gentropy release 24.12	23,709,155	17,553,867	74.04%
OTG portal release 22.10	7,408,493	4,357,079	58.81%
Gentropy 24.12 N>10	8,566,692	4,530,616	52.88%

Tests related to COLOC have been adjusted accordingly, to include posterior probabilities of >0.9 on both sides to prevent filtering out.

🙈 Missing

Future work will involve dynamic penalising of the prior term in H4 calculations, when the overlaps are small.
Implement parameters into COLOC method

🚦 Before submitting

[ x] Do these changes cover one single feature (one change at a time)?
[ x] Did you read the contributor guideline?
[ x] Did you make sure to update the documentation with your changes?
[ x] Did you make sure there is no commented out code in this PR?
[ x] Did you follow conventional commits standards in PR title and commit messages?
[ x] Did you make sure the branch is up-to-date with the dev branch?
[ x] Did you write any new necessary tests?
[ x] Did you make sure the changes pass local tests (make test)?
[ x] Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

…< 10)

xyg123 · 2025-01-22T14:56:34Z

@addramir Please let me know if you'd like to discuss changing the parameters (PP > 0.9 and N<10).

…1_coloc_fix

xyg123 · 2025-01-23T11:50:28Z

Per @addramir's request, investigating the effect of minimum overlap cutoff on the proportion of significant H4s:

N	Total_Count	H4 > 0.8 Count	Ratio
1	16627720	11303929	0.6798243535493742
2	14355138	9404867	0.6551568504600931
3	12896260	8159638	0.632713515391284
4	11851917	7271878	0.6135613335800445
5	11081132	6626729	0.5980191373949881
6	10409353	6058734	0.582047126271921
7	9846088	5586696	0.5674026070049344
8	9345295	5168813	0.5530925455001688
9	8930353	4828156	0.5406455937408073
10	8566692	4530616	0.5288641169777085

addramir · 2025-01-23T12:00:05Z

Looks good to me. Let's use N=5 as threshold.

addramir · 2025-01-27T10:17:02Z

@ireneisdoomed the code and logic look good to me. Please have a look on the code - if it is fine - let's approve

src/gentropy/method/colocalisation.py

…1_coloc_fix

project-defiant · 2025-01-30T15:33:28Z

src/gentropy/method/colocalisation.py

+                    f.aggregate(
+                        f.transform(
+                            f.arrays_zip(
+                                fml.vector_to_array(f.col("left_posteriorProbability")),


I am not so sure that the converting to vectors and back is a correct way to handle the posteriorProbabilibies, I wish to understand the logic, why the VectorUDT was used initially here. @ireneisdoomed do you know the reason behind the vectorization of the logBF values (is it the sparsity or default numpy conversion?)

project-defiant · 2025-01-30T15:49:14Z

src/gentropy/method/colocalisation.py

+                            ),
+                            # row["0"] = left PP, row["1"] = right PP, row["tagVariantSourceList"]
+                            lambda row: f.when(
+                                (row["tagVariantSourceList"] == "both")


I think checking for both needs to happen before you array_zip, otherwise you will end up with mixed results (not comming from the same variant) - I may be wrong though, can you provide a specific test to this part? You could add it on top of the colocalisation step test. I would like to see a test cases for:

Case when left overlap (or right) does not exist, so the algorithm orders the lists (array_zip) correctly just taking into account the both

Case when all PIPs from left side are low and all PIPs from right side are high

Case when at least one PIP from left and one PIP from right is high

To do this you would have to make an overlap example with at least 2 variants on one side and 3 variants on the other side

project-defiant · 2025-01-30T15:54:42Z

src/gentropy/method/colocalisation.py

+                                & (row["0"] > Coloc.POSTERIOR_CUTOFF)
+                                & (row["1"] > Coloc.POSTERIOR_CUTOFF),
+                                1.0,
+                            ).otherwise(0.0),


I am not 100% sure you are comparing 2 the same variants here (since they can be left or right oriented as well)

project-defiant

Please take a look at the comments, as it is a crucial part, lets use this chance to make test it a bit more.

…1_coloc_fix

xyg123 added 2 commits January 22, 2025 11:12

feat: implement posterior prob filter for COLOC at small overlaps (N …

cd5d6af

…< 10)

chore: add variables for filtering parameters

2f4109c

xyg123 requested review from addramir and ireneisdoomed January 22, 2025 14:55

github-actions bot added size-S Method Feature labels Jan 22, 2025

Merge branch 'dev' of https://github.com/opentargets/gentropy into xg…

a8ba427

…1_coloc_fix

fix: adjust overlap size cutoff to match OTG proportions

f231c6c

Merge branch 'dev' into xg1_coloc_fix

e24e51d

project-defiant reviewed Jan 27, 2025

View reviewed changes

src/gentropy/method/colocalisation.py Outdated Show resolved Hide resolved

project-defiant and others added 4 commits January 28, 2025 15:21

Merge branch 'dev' into xg1_coloc_fix

5273af7

Merge branch 'dev' of https://github.com/opentargets/gentropy into xg…

28cebc4

…1_coloc_fix

Merge branch 'dev' of https://github.com/opentargets/gentropy into xg…

0ec4a05

…1_coloc_fix

chore: fix OVERLAP_SIZE_CUTOFF default to 5

f15b913

project-defiant self-requested a review January 30, 2025 11:01

project-defiant reviewed Jan 30, 2025

View reviewed changes

xyg123 and others added 2 commits January 31, 2025 09:06

Merge branch 'dev' of https://github.com/opentargets/gentropy into xg…

639187a

…1_coloc_fix

Merge branch 'dev' into xg1_coloc_fix

94de973

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement posterior prob filter for COLOC at small overlaps N<10 #977

feat: implement posterior prob filter for COLOC at small overlaps N<10 #977

xyg123 commented Jan 22, 2025

xyg123 commented Jan 22, 2025

xyg123 commented Jan 23, 2025

addramir commented Jan 23, 2025

addramir commented Jan 27, 2025

project-defiant Jan 30, 2025

project-defiant Jan 30, 2025 •

edited

Loading

project-defiant Jan 30, 2025

project-defiant left a comment

feat: implement posterior prob filter for COLOC at small overlaps N<10 #977

Are you sure you want to change the base?

feat: implement posterior prob filter for COLOC at small overlaps N<10 #977

Conversation

xyg123 commented Jan 22, 2025

✨ Context

🛠 What does this PR implement

🙈 Missing

🚦 Before submitting

xyg123 commented Jan 22, 2025

xyg123 commented Jan 23, 2025

addramir commented Jan 23, 2025

addramir commented Jan 27, 2025

project-defiant Jan 30, 2025

Choose a reason for hiding this comment

project-defiant Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

project-defiant Jan 30, 2025

Choose a reason for hiding this comment

project-defiant left a comment

Choose a reason for hiding this comment

project-defiant Jan 30, 2025 •

edited

Loading