Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(config): extract gwas_significance parameter to step configuration #628

Merged
merged 6 commits into from
Jun 4, 2024

Conversation

project-defiant
Copy link
Contributor

✨ Context

opentargets/issues#3327

🛠 What does this PR implement

This is the first part of the #3327.

  • Extraction of the gwas_significance (p-value) threshold parameter for the window-based-clumping step.
  • Addition of gwas_significance to the airflow config with the new value (1e-8) overwriting default one (5e-8)
  • Unified naming convention for the WindowBasedClumpingStep config to WindowBasedClumpingStepConfig

The main reason for this change is that we want to use the value of 1e-8 and possibly even more stringent value of 1e-9 in the future.

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@project-defiant project-defiant force-pushed the 3327-preparation-for-the-2405-data-release branch from 5e6a181 to 323d0c3 Compare May 30, 2024 13:34
src/gentropy/config.py Show resolved Hide resolved
config/step/ot_window_based_clumping.yaml Show resolved Hide resolved
src/gentropy/method/window_based_clumping.py Outdated Show resolved Hide resolved
src/gentropy/window_based_clumping.py Show resolved Hide resolved
@project-defiant
Copy link
Contributor Author

@addramir just to clear things up here,
I wanted to persist the original value of 5e-8 in the gentropy API, as it can be used by downstream software and changing it might be a breaking change - so it should be introduced in a major version, thus 1.3 -> 2.0. It would also be good to inform our users about it before, so they can adjust to it.

The airflow layer that will run the data release will use the value of gwas_significance from config/step/ot_window_based_clumping.yamlconfig/step/ot_window_based_clumping.yaml
so we will still use value 1e-8 in the data release.

If you think we want to drop the original value, without maintaining the backwards compatibility for gentropy users, I will drop the warning and change the default value in WindowBasedClumpingStepConfig to 1e-8.

Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean up! All good, 2 comments:

  • I'd consider removing the warning.
  • We apply a significance threshold also in the GWAS Catalog top hits ingestion. You might want to apply the same variable as well? Check here.

src/gentropy/config.py Show resolved Hide resolved
config/step/ot_window_based_clumping.yaml Show resolved Hide resolved
src/gentropy/method/window_based_clumping.py Outdated Show resolved Hide resolved
src/gentropy/window_based_clumping.py Show resolved Hide resolved
@project-defiant
Copy link
Contributor Author

@ireneisdoomed and @addramir thank you both for the feedback!

With regards to the issues, I would like to make a clear statments:

  1. @addramir do we want to make threshold of 1e-8 as a default to all users that will use gentropy (even outside the Open Targets) - then it might be good to preserve the warning, otherwise I will drop it.
  2. @addramir as @ireneisdoomed suggested, there is another threshold for the p-value found in here and is a part of the GWASCatalogStudyInclusionGenerator. The step is a part of the GWAS preprocessing DAG, which does not allow for the p-value change. Should we keep it in sync with the clumping step?
  3. Looking over the code for 5e-8, there are a few more places - where the threshold is found
  • SummaryStatisticsQC
  • SusieFineMapperStep
    should we update these as well?

@addramir
Copy link
Contributor

addramir commented Jun 3, 2024

@project-defiant @ireneisdoomed Thank you for comments!
I think we need to go with 1e-8 everywhere. Since it is suggested as more correct default threshold. See for example paper from Peter Vischer:
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1216-0#:~:text=We%20therefore%20recommend%20to%20use,increase%20of%20sample%20size%20%5B14%5D

"We therefore recommend to use a threshold of 1e-8 for GWAS with common variants, which might be slightly conservative for current datasets but should be appropriate for data from WGS or imputation-based studies in the future because the number of variants is expected to increase with the increase of sample size [14]"

It is not really something to fight for really. The idea behind this data realise was to change the threshold only for GWAS Catalog for now. We can change it everywhere else (that is a correct thing) but we can do it later.

@project-defiant
Copy link
Contributor Author

project-defiant commented Jun 4, 2024

@ireneisdoomed I have dropped the warning and added the default threshold to the code you mentioned here., as it seems logical to keep the clumping step and gwas annotation in sync. Also created a separate issue for updating other thresholds, so we do not forget about it.

Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@project-defiant project-defiant merged commit daa8331 into dev Jun 4, 2024
4 checks passed
@project-defiant project-defiant deleted the 3327-preparation-for-the-2405-data-release branch June 4, 2024 20:09
project-defiant added a commit that referenced this pull request Jun 14, 2024
…on (#628)

* feat(clumping): lower p-value significance threshold for clumping step
* feat(config): extract gwas_significance to config
* feat(config): synced p-value in association parsing

---------

Co-authored-by: Szymon Szyszkowski <ss60@mib117351s.internal.sanger.ac.uk>
Co-authored-by: Yakov <yt4@sanger.ac.uk>
project-defiant pushed a commit that referenced this pull request Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants