-
Notifications
You must be signed in to change notification settings - Fork 2
1. POP‐GWAS
POP‐GWAS
takes three GWAS summary statistics as input to conduct valid and powerful ML-assisted GWAS. It can be used for both quantitative and binary phenotype.
You can use POP-GWAS
to perform ML-assisted GWAS on a quantitative phenotype (using head bone mineral density as an example, attached in the test folder) by
cd POP-TOOLS
trait=Head_BMD
python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--out ./test/result/${trait}
Here, yhat
represents the imputed phenotype, y
represents the observed phenotype, lab
represents the labeled dataset, and unlab
represents the unlabeled dataset. The combination of these is the three required GWAS:
-
--gwas-yhat-unlab
: GWAS on imputed phenotype in unlabeled data -
--gwas-y-lab
: GWAS on observed phenotype in labeled data -
--gwas-yhat-lab
: GWAS on imputed phenotype in labeled data
The outputs is the result for POP-GWAS:
head ./test/result/Head_BMD_POP-GWAS.txt
CHR BP SNP A1 A2 EAF BETA SE Z P N_eff
22 16495833 rs79847867 A C 0.07798 0.01157 0.01283 0.901 3.673e-01 42228
22 16496170 rs560288282 A G 0.07798 0.01157 0.01283 0.901 3.673e-01 42228
22 16870108 rs131528 T C 0.31422 -0.00305 0.00742 -0.411 6.809e-01 42179
22 16870162 rs131529 A G 0.31429 -0.00278 0.00742 -0.375 7.078e-01 42184
22 16870214 rs131530 A G 0.31426 -0.00289 0.00742 -0.389 6.972e-01 42182
There are several things to note:
- The required input data format for
POP-GWAS
can be found in this page. - SNPs in chromosome 22 are included in the example data for demonstration purposes. However, please use the full GWAS summary statistics containing SNPs in chr 1-22 as input, if you want use POP-GWAS to estimate the r (phenotypic correlation). In our test, it takes only about 3 minutes to produce results for a GWAS with 10 million SNPs.
- The interpretation of
BETA
in the POP-GWAS summary statistics is the increase per allele in standard deviation units of phenotype. The SE is on the same scale as theBETA
. -
N_eff
is the effective sample size of the ML-assisted GWAS. - We recommend to apply the sample overlap correction in POP-GWAS, if there are overlapping samples or residual correlations between input GWAS in labeled and unlabeled data. Such residual correlations can be quantified by the intercept of bivariate LD score regression.
Here we provide a few useful examples:
You can apply POP-GWAS
to binary phenotypes by simply adding --bt
to the script for the quantitative phenotype. Below is the script that applies POP-GWAS
to a binary phenotype, using type-2 diabetes as an example.
cd POP-TOOLS
trait=T2D
python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--bt \
--out ./test/result/${trait}
The outputs is the result for POP-GWAS:
head ./test/result/T2D_POP-GWAS.txt
CHR BP SNP A1 A2 EAF OR SE Z P N_eff N_eff_case N_eff_control
22 16495833 rs79847867 A C 0.07836 0.01304 0.16857 0.077 9.383e-01 136416 6032 130384
22 16496170 rs560288282 A G 0.07836 0.01304 0.16857 0.077 9.383e-01 136416 6032 130384
22 16870108 rs131528 T C 0.31340 0.04334 0.09778 0.443 6.576e-01 136269 6021 130248
22 16870162 rs131529 A G 0.31346 0.04984 0.09771 0.510 6.100e-01 136284 6025 130259
22 16870214 rs131530 A G 0.31342 0.04828 0.09770 0.494 6.212e-01 136287 6026 130260
There are a few things to note:
- We require the
--gwas-y-lab
is on binary traits when using--bt
. Otherwise, please use thePOP-GWAS
for quantitative phenotype. - The format for the other input summary statistics depends on whether the imputed phenotype is quantitative or binary. Use format for quantitative phenotype if it is quantitative and format of binary phenotype if it is binary.
You can apply POP-GWAS
with sample overlap correction by adding -sample-overlap
to address the dependence between GWAS in labeled and unlabeled data. Here is an example for quantitative trait:
cd POP-TOOLS
trait=Head_BMD
python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--sample-overlap \
--out ./test/result/${trait}_ovp
The available flags for POP-GWAS
to conduct ML-assisted GWAS are
python3 ./POP-GWAS.py \
--gwas-yhat-unlab <Path to GWAS summary statistics file of imputed phenotype in unlabeled data> \
--gwas-y-lab <Path to GWAS summary statistics file of observed phenotype in labeled data> \
--gwas-yhat-lab <Path to GWAS summary statistics file of imputed phenotype in labeled data> \
--out <The prefix of path to output summary statistics> \
# The following flags are optional.
--bt <Whether the phenotype is binary or not>
where the flags in order are
-
--gwas-yhat-unlab
(required): Full path to the GWAS summary statistics on imputed phenotype in unlabeled data in the required format -
--gwas-y-lab
(required): Full path to the GWAS summary statistics on observed phenotype in labeled data in the required format -
--gwas-yhat-lab
(required): Full path to the GWAS summary statistics observed phenotype in unlabeled data in the required format -
--out
(required): The prefix of the path to output the summary statistics. The output contains the text file<Prefix to the output file>_POP-GWAS.txt
for POP-GWAS summary statistics and<Prefix to the output file>_POP-GWAS.log
for debugging. -
--bt
(optional): indication of whether the phenotype is binary or not.