Skip to content

1. POP‐GWAS

Jiacheng Miao edited this page Jul 27, 2024 · 9 revisions

POP‐GWAS takes three GWAS summary statistics as input to conduct valid and powerful ML-assisted GWAS. It can be used for both quantitative and binary phenotype.

TL;DR

You can use POP-GWAS to perform ML-assisted GWAS on a quantitative phenotype (using head bone mineral density as an example, attached in the test folder) by

cd POP-TOOLS

trait=Head_BMD

python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--out ./test/result/${trait}

Here, yhat represents the imputed phenotype, y represents the observed phenotype, lab represents the labeled dataset, and unlab represents the unlabeled dataset. The combination of these is the three required GWAS:

  • --gwas-yhat-unlab: GWAS on imputed phenotype in unlabeled data
  • --gwas-y-lab: GWAS on observed phenotype in labeled data
  • --gwas-yhat-lab: GWAS on imputed phenotype in labeled data

The outputs is the result for POP-GWAS:

head ./test/result/Head_BMD_POP-GWAS.txt

CHR     BP      SNP     A1      A2      EAF     BETA    SE      Z       P       N_eff
22      16495833        rs79847867      A       C       0.07798 0.01157 0.01283 0.901   3.673e-01       42228
22      16496170        rs560288282     A       G       0.07798 0.01157 0.01283 0.901   3.673e-01       42228
22      16870108        rs131528        T       C       0.31422 -0.00305        0.00742 -0.411  6.809e-01       42179
22      16870162        rs131529        A       G       0.31429 -0.00278        0.00742 -0.375  7.078e-01       42184
22      16870214        rs131530        A       G       0.31426 -0.00289        0.00742 -0.389  6.972e-01       42182

There are several things to note:

  • The required input data format for POP-GWAS can be found in this page.
  • SNPs in chromosome 22 are included in the example data for demonstration purposes. However, please use the full GWAS summary statistics containing SNPs in chr 1-22 as input, if you want use POP-GWAS to estimate the r (phenotypic correlation). In our test, it takes only about 3 minutes to produce results for a GWAS with 10 million SNPs.
  • The interpretation of BETA in the POP-GWAS summary statistics is the increase per allele in standard deviation units of phenotype. The SE is on the same scale as the BETA.
  • N_eff is the effective sample size of the ML-assisted GWAS.
  • We recommend to apply the sample overlap correction in POP-GWAS, if there are overlapping samples or residual correlations between input GWAS in labeled and unlabeled data. Such residual correlations can be quantified by the intercept of bivariate LD score regression.

Useful examples

Here we provide a few useful examples:

Binary phenotype

You can apply POP-GWAS to binary phenotypes by simply adding --bt to the script for the quantitative phenotype. Below is the script that applies POP-GWAS to a binary phenotype, using type-2 diabetes as an example.

cd POP-TOOLS

trait=T2D

python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--bt \
--out ./test/result/${trait}

The outputs is the result for POP-GWAS:

head ./test/result/T2D_POP-GWAS.txt

CHR     BP      SNP     A1      A2      EAF     OR      SE      Z       P       N_eff   N_eff_case      N_eff_control
22      16495833        rs79847867      A       C       0.07836 0.01304 0.16857 0.077   9.383e-01       136416  6032    130384
22      16496170        rs560288282     A       G       0.07836 0.01304 0.16857 0.077   9.383e-01       136416  6032    130384
22      16870108        rs131528        T       C       0.31340 0.04334 0.09778 0.443   6.576e-01       136269  6021    130248
22      16870162        rs131529        A       G       0.31346 0.04984 0.09771 0.510   6.100e-01       136284  6025    130259
22      16870214        rs131530        A       G       0.31342 0.04828 0.09770 0.494   6.212e-01       136287  6026    130260

There are a few things to note:

Sample overlap correction

You can apply POP-GWAS with sample overlap correction by adding -sample-overlap to address the dependence between GWAS in labeled and unlabeled data. Here is an example for quantitative trait:

cd POP-TOOLS

trait=Head_BMD

python3 ./POP-GWAS.py \
--gwas-yhat-unlab ./test/data/${trait}_yhat_unlab.txt.gz \
--gwas-y-lab ./test/data/${trait}_y_lab.txt.gz \
--gwas-yhat-lab ./test/data/${trait}_yhat_lab.txt.gz \
--sample-overlap \
--out ./test/result/${trait}_ovp

Available flags

The available flags for POP-GWAS to conduct ML-assisted GWAS are

python3 ./POP-GWAS.py \
--gwas-yhat-unlab <Path to GWAS summary statistics file of imputed phenotype in unlabeled data> \
--gwas-y-lab <Path to GWAS summary statistics file of observed phenotype in labeled data> \
--gwas-yhat-lab <Path to GWAS summary statistics file of imputed phenotype in labeled data> \
--out <The prefix of path to output summary statistics> \
# The following flags are optional.
--bt <Whether the phenotype is binary or not>

where the flags in order are

  • --gwas-yhat-unlab (required): Full path to the GWAS summary statistics on imputed phenotype in unlabeled data in the required format
  • --gwas-y-lab (required): Full path to the GWAS summary statistics on observed phenotype in labeled data in the required format
  • --gwas-yhat-lab (required): Full path to the GWAS summary statistics observed phenotype in unlabeled data in the required format
  • --out(required): The prefix of the path to output the summary statistics. The output contains the text file <Prefix to the output file>_POP-GWAS.txt for POP-GWAS summary statistics and <Prefix to the output file>_POP-GWAS.log for debugging.
  • --bt(optional): indication of whether the phenotype is binary or not.
Clone this wiki locally