This directory contains the MultiVariate Polygenic Mixture Model (MPVMM) for estimating genetic parameters as described in DeBoever et al. 2019.
MVPMM takes as input GWAS summary statistics (estimated effect sizes and standard errors) for two phenotypes and estimates genetic parameters such as genetic correlation, polygenicity, and scale of genetic effects for the phenotypes. Please see the manuscript for more information on MVPMM.
MVPMM is specified in the mvpmm.stan
file. We've included a helper script
mvpmm.R
for running MVPMM using rstan.
mvpmm.R
has the following arguments:
data.fn
: File containing summary statistics. data.fnout.fn
: File to save output to. out.fnhdf5
: Save to HDF5. If FALSE, save as RData file.mcmc
: Ifmcmc=TRUE
, then do an MCMC run. Otherwise, use Stan's optimization function to perform maximum a posteriori estimation.niter
: Number of MCMC iterations or number of draws for optimization.warmup
: Number of MCMC warmup iterations. Ignored for optimization.chains
: Number of MCMC chains. Ignored for optimization.cores
: Number of cores (should probably be the same as the number of chains). Ignored for optimization.mvpmm.fn
: Stan file to use.
An input file is a tab-separated file with an index column followed by these columns:
ID
: Variant ID (e.g. rsID)BETA
: GWAS effect size estimateSE
: Standard error of GWAS effect sizeP
: Unadjusted p-value from GWAScode
: Phenotype label
The variant IDs contained in the ID
column should not be repeated within a
phenotype. Each variant should be present for each phenotype. Take care to LD
prune variants prior to creating the input file and make sure that the effect
alleles are consistent between GWAS studies.
This is an example of a few lines from an input file:
ID BETA SE P code
1085 rs1001495 -0.072786 0.036405 0.045570 RA
6055 rs1001495 0.029559 0.051766 0.567996 SLE
1530 rs10025152 -0.021224 0.024549 0.387300 RA
6500 rs10025152 -0.020203 0.030855 0.512618 SLE
1385 rs10031922 0.075478 0.046814 0.106900 RA
6355 rs10031922 0.009950 0.103314 0.923273 SLE
There is an example input file included at example/mvp_input.tsv.gz
.
The file example/mvp_input.tsv.gz
is an example input file created from
GWAS summary statistics for family history of high blood pressure and essential
hypertension in the UK Biobank. Note that the variants included in this file
have been LD-pruned.
We can run MVPMM using Stan's optimization function for 100 iterations for these GWAS summary statistics using the following command:
Rscript mvpmm.R example/mvp_input.tsv.gz mvp_output.RData FALSE FALSE 100 0 1 1 mvpmm.stan
We can perform an MCMC with 100 burn-in iterations run as follows:
Rscript mvpmm.R example/mvp_input.tsv.gz ra_sle.RData FALSE TRUE 400 100 1 1 mvpmm.stan
You can change into the example
directory and use bash run.sh
to run an
optimization example.