Skip to content
forked from xiashen/MultiABEL

Multi-Trait Genome-Wide Association Analysis

License

Notifications You must be signed in to change notification settings

zhenin/MultiABEL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

MultiABEL

Multi-Trait Genome-Wide Association Analysis

Installation

Run the following command in R to install the MultiABEL package:

install.packages("MultiABEL")

or its developer version:

install.packages("MultiABEL", repos="http://R-Forge.R-project.org")

MultiABEL can be loaded in R via:

library(MultiABEL)

or

require(MultiABEL)

Multi-Trait GWAS using Genotypes from SNP Arrays

Let us first load the example data in the R package GenABEL:

data(ge03d2ex.clean)

which is an object of class gwaa.data, generated by the GenABEL package. *ABEL suite of packages are compatible with different formats of SNP array genotyping data, such as Affymetrix, Illumina, MACH, PLINK, and text files. For conversion of these formats into gwaa.data class in R, refer to convert.snp.FORMAT series of functions in GenABEL.

Once a gwaa.data object is ready, multi-trait GWAS using the MultiABEL package has only two steps. The first step is to load the gwaa.data object as a multi.loaded format object. For example, in the example data, if we want to perform multi-trait GWAS for height, weight and BMI, with sex and age included as covariates, we run:

loaded <- MultiLoad(gwaa.data = ge03d2ex.clean, trait.cols = c(5, 6, 8), covariate.cols = c(2, 3))

which calculates required statistics for subsequent multi-trait scan. The variables are indicated by column indices, corresponding to the columns in the phenotypic data stored in the gwaa.data object:

head(ge03d2ex.clean@phdata)
         id sex       age dm2    height     weight diet       bmi
id199 id199   1 59.228721   1 163.91234  80.407462    0 29.927679
id300 id300   1 42.325527   1 177.24822  80.800665    1 25.718829
id403 id403   0 31.225693   1 152.59305 114.842811    1 49.321274
id415 id415   0 54.455580   1 172.83431  97.392762    0 32.603692
id666 id666   1 61.068030   1 172.79525 104.815830    0 35.104539
id689 id689   1 57.455284   1 183.40863  83.480342    0 24.816739

Thereafter, the multi-trait scan results can be produced by:

result <- Multivariate(loaded)

The top of the result data frame looks like:

head(result)
             Marker       Beta.S          SE          P  coef.height  coef.weight      coef.bmi
rs1646456 rs1646456 0.0204838238 0.031169284 0.51106532 -0.151990037  0.173668489 -0.1534429989
rs4435802 rs4435802 0.0086583260 0.035745822 0.80861029  0.092143206 -0.202380210  0.1762167094
rs946364   rs946364 0.0214048834 0.031101669 0.49131238 -0.163415525  0.214047011 -0.2005150314
rs299251   rs299251 0.0304247846 0.031102692 0.32797333  0.089547868 -0.253542764  0.2019248957
rs2456488 rs2456488 0.0278942115 0.031014170 0.36843849  0.257256624 -0.541874214  0.4251691269
rs3712159 rs3712159 0.0099905862 0.034535222 0.77236186  0.025174520 -0.026662859 -0.0059335234

where Beta.S and SE are the estimated genetic effect size and s.e. on the phenotype score, constructed using the coefficients in the last few coef.TRAIT columns. P gives the multivariate p-values.

Multi-Trait GWAS using Genotypes from Imputed Data

As other *ABEL packages, MultiABEL directly works with imputed data in DatABEL format, which is file vector formatted and allows for fast computation. Popular imputed data formats such as IMPUTE can be easily converted using the DatABEL package, please refer to its documentation for the conversion step if needed.

Let us now convert the example data above in gwaa.data class into DatABEL files and run the same scan. First, write the phenotypic data out as the phenotype file:

write.table(phdata(ge03d2ex.clean), 'pheno.txt', col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t')

Convert the genotype data into allelic codings and then into DatABEL data files:

require(DatABEL)
geno <- as.double(ge03d2ex.clean)
matrix2databel(geno, 'geno')
uninames$unique.names = TRUE
uninames$unique.rownames = TRUE
uninames$unique.colnames = TRUE
backingfilename = geno2
cachesizeMb = 64
number of columns (variables) =  3507
number of rows (observations) =  116
usedRowIndex: 1  2  3  4  5  ...
usedColIndex: 1  2  3  4  5  6  7  8  9  10  ...
Upper-left 10 columns and  5 rows:
      rs1646456 rs4435802 rs946364 rs299251 rs2456488 rs3712159 rs4602970 rs175910 rs1919938 rs1116030
id199         1         0        1      NaN         0         0         0        2         0         0
id300         0         0        0        0         0         0         0        2         0         0
id403         1         0        1        0         1       NaN         1        0         1         0
id415         0         1      NaN        0         0         0         0        1         1         1
id666         1         0        1        1         1         0         0        0         2         0

We can then load the imputed data again using MultiLoad() procedure, but with slightly different input:

loaded <- MultiLoad(phenofile = 'pheno.txt',  genofile = 'geno', trait.cols = c(5, 6, 8), covariate.cols = c(2, 3))

Thereafter, the multi-trait scan results can be produced the same as above:

result <- Multivariate(loaded)

Correcting for Population Structure via GRAMMAR+ Residuals

We suggest the use of GRAMMAR+ residuals to correct for population stratification, generated by the polygenic() procedure in the GenABEL package. Unlike using the ordinary residuals from linear mixed models, i.e. the GRAMMAR method (Aulchenko et al. 2007 Genetics), GRAMMAR+ residuals provide another version of phenotype transformation to very well approximate a full mixed model solution (Belonogova et al. 2013 PLoS ONE), allowing multivariate analysis of multiple phenotypes in a structured population computationally more feasible.

If we want to correct for population structure in the above example (assuming there is anything to correct for), we first need to construct the genomic kinship matrix via:

gkin <- ibs(ge03d2ex.clean, weight = 'freq')

For imputed data analysis, this step can be simply performed on the array data where we have directly genotyped markers. Then we can loop over the three phenotypes that we want to analyze, to get the corresponding GRAMMAR+ residuals:

plus <- ge03d2ex.clean@phdata
for (i in c(5, 6, 8)) {
	poly <- polygenic(ge03d2ex.clean@phdata[,i], gkin, ge03d2ex.clean)
	plus[,i] <- poly$grresidualY
}

Now we can save the residuals as a new phenotype file, and run the above multivariate analysis using our prepared “imputed” genotype data:

write.table(plus, 'pheno.plus.txt', col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t')
loaded <- MultiLoad(phenofile = 'pheno.plus.txt',  genofile = 'geno', trait.cols = c(5, 6, 8), covariate.cols = c(2, 3))

Thereafter, the multi-trait scan results can be produced the same as above:

result.plus <- Multivariate(loaded)

The top of the new result looks like:

head(result.plus)
             Marker       Beta.S          SE          P  coef.height  coef.weight      coef.bmi
rs1646456 rs1646456 0.0204838238 0.031169284 0.51106532 -0.151990037  0.173668489 -0.1534429989
rs4435802 rs4435802 0.0086583260 0.035745822 0.80861029  0.092143206 -0.202380210  0.1762167094
rs946364   rs946364 0.0214048834 0.031101669 0.49131238 -0.163415525  0.214047011 -0.2005150314
rs299251   rs299251 0.0304247846 0.031102692 0.32797333  0.089547868 -0.253542764  0.2019248957
rs2456488 rs2456488 0.0278942115 0.031014170 0.36843849  0.257256624 -0.541874214  0.4251691269
rs3712159 rs3712159 0.0099905862 0.034535222 0.77236186  0.025174520 -0.026662859 -0.0059335234

There is not a substantial structure in this example data, so we do not expect to see inflated signals being corrected away by linear mixed models.

Multi-Trait GWAS using Summary Association Statistics

Loading multiple GWAS summary statistics

MultiABEL allows convenient and fast GWAS of multiple phenotypes directly from summary association statistics, i.e. genome-wide (or sufficiently large subset of) association results containing estimated genetic effects, standard errors, and reference alleles information. Here, we directly provide an example, and the data can be obtained from: https://www.dropbox.com/sh/2xftha9wcanobo4/AAD6ygCMyUv_gpDtIwRtw-Mta?dl=0

Prior to loading the summary association statistics, you need names of a set of independent SNPs. These SNPs will be used for estimating the phenotypic correlation between phenotypes, accounting for partial sample overlap. The number of such SNPs are not important, as long as it's large enough, e.g. thousands. However, LD-pruning might be important. We provide a set of LD-pruned SNPs that can be used for any set of European-ancestry GWAS, can be loaded as:

load('indep.snps.RData')

Thereafter, the summary statistics data can be loaded as:

data <- load.summary(c('height.txt', 'weight.txt', 'bmi.txt'), indep.snps = indep.snps)
loading data ...
Progress: 100%
checking markers ...
Progress: 100%
cleaning data ...
Progress: 100%
correcting parameters ...
Progress: 100%
adjusting sample size ... done.
finalizing summary statistics ...
Progress: 100%
samples partially overlap!
estimating shrinkage phenotypic correlations ... done.

For Help

For direct R documentation of the two functions above, you can simply use question mark in R:

?MultiLoad
?Multivariate

If you have specific questions, you may email the maintainer of MultiABEL via xia dot shen at ed dot ac dot uk.

About

Multi-Trait Genome-Wide Association Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published