-
Notifications
You must be signed in to change notification settings - Fork 28
Syntax and results of HDL
In this document, you will have a glimpse of the syntax and results of HDL
, as well as how to use its parallel version to speed up. For a fast illustration, we use two sets of cleaned UKB GWAS summary statistics for array SNPs as examples. More details about this example can be found later.
Please notice although we use array SNPs as reference panel here, it is recommended to use imputed SNPs as reference panel for more precise estimates (see Reference panels). More real examples and applications can be found in the other pages of the wiki.
You can simply run HDL.run.R
like below to use HDL
tool:
Rscript /Path/to/HDL/HDL.run.R \
gwas1.df=/Path/to/gwas1/gwas1.array.example.rds \
gwas2.df=/Path/to/gwas2/gwas2.array.example.rds \
LD.path=/Path/to/reference/UKB_array_SVD_eigen90_extraction \
output.file=/Path/to/output/test.Rout
There are several arguments you should pass to HDL.run.R
. Please note that when you specify arguments, there should not be any space on any side of =
.
- Mandatory arguments
-
gwas1.df
, the path to the file including the GWAS results for trait 1. Most of the common file extensions are supported, including.gz
files. If a GWAS is not successfully loaded, it is recommended to transfer it to .txt or .rds; -
gwas2.df
, the path to the file including the GWAS results for trait 2; -
LD.path
, the path to the directory where linkage disequilibrium (LD) information is stored (i.e. where the reference panel located).
-
- Optional arguments
-
output.file
, where the log and results should be written. If you do not specify a file, the log will be printed in the console; -
Nref
, the sample size of the reference sample where LD is computed. If the default UK Biobank reference sample is used, Nref = 335265; -
N0
, the number of individuals included in both cohorts. However, the estimated genetic correlation is usually robust against misspecified N0. If not given, the default value is set to the minimum sample size across all SNPs in cohort 1 and cohort 2. -
eigen.cut
, which eigenvalues and eigenvectors in each LD score matrix should be used for HDL. Users are allowed to specify a numeric value between 0 and 1 for eigen.cut. For example, eigen.cut = 0.99 means using the leading eigenvalues explaining 99 and their correspondent eigenvectors. If the default 'automatic' is used, the eigen.cut gives the most stable heritability estimates will be used. -
jackknife.df
, logical, FALSE by default. Should the block-jackknife estimates be returned? If settingjackknife.df=TRUE
, in the command-line version, the block-jackknife estimates will be written to a file named by the combination ofoutput.file
, and "_jackknife.df.txt".
-
HDL.rg
is the function to perform HDL method. The arguments for HDL.rg
are the same as the above arguments for the command-line implementation:
data(gwas1.example)
data(gwas2.example)
LD.path <- "/Path/to/reference/UKB_array_SVD_eigen90_extraction"
res.HDL <- HDL.rg(gwas1.example, gwas2.example, LD.path)
res.HDL
A list is returned with
-
rg
, the estimation of genetic correlation. -
rg.se
, the standard error of estimated genetic correlation. -
P
, the Wald test P-value forrg
. -
estimates.df
, a detailed matrix includes the estimates and standard errors of heritabilities, genetic covariance and genetic correlation. -
eigen.use
, the eigen.cut used in computation. -
jackknife.df
, only if argumentjackknife.df
is true. A matrix includes the block-jackknife estimates of heritabilities, genetic covariance and genetic correlation.
The first section shows specified arguments :
Function arguments:
gwas1.df=/opt/working/wilson/projects/prj_990_ldsc_enrich/hdl_test/HDL/gwas1.array.example.rds
gwas2.df=/opt/working/wilson/projects/prj_990_ldsc_enrich/hdl_test/HDL/gwas2.array.example.rds
LD.path=/opt/storage/wilson/projects/prj_994_UKB_ldscore/UKB_array_SVD_eigen90_extraction/
output.file=/opt/working/wilson/projects/prj_990_ldsc_enrich/hdl_test/test.out
Followed by some basic information about the installed version of HDL
:
HDL: High-definition likelihood inference of genetic correlations (HDL)
Version 1.3.2 (2020-06-06) installed
Author: Zheng Ning, Xia Shen
Maintainer: Zheng Ning <zheng.ning@ki.se>
Tutorial: https://github.com/zhenin/HDL
Use citation("HDL") to know how to cite this work.
In the next section, the proportions of overlap SNPs between GWAS summary statistics and reference panel are reported. Because a low SNP overlap leads to poor estimation, HDL
will generate a warning if the overlap rate is lower than 99% (i.e. more than 3,075 SNPs missing for array reference panel and 10,299 SNPs missing for imputed reference panel).
Analysis starts on Sat Jun 6 22:51:47 2020
307519 out of 307519 (100%) SNPs in reference panel are available in GWAS 1.
307519 out of 307519 (100%) SNPs in reference panel are available in GWAS 2.
The last section gives the genetic correlation, its standard error, and P-value based on the Wald test. The estimates and standard errors of heritabilities and genetic covariance are also provided.
Heritability of phenotype 1: 0.1609 (0.0075)
Heritability of phenotype 2: 0.0131 (0.0012)
Genetic Covariance: -0.0101 (0.0018)
Genetic Correlation: -0.2206 (0.0391)
P: 1.70e-08
Note: Although estimates of heritabilities and genetic covariance are also provided, they should be interpreted with caution. For LDSC, there have been some concerns about the potential bias when estimates these quantities. However, the estimate of genetic correlation is much more robust due to its ratio form. As HDL is a natural extension of LDSC, we suggest focusing the application of HDL on estimating genetic correlations. Please see more details and discussions on this in the HDL paper.
If there are multiple cores available in your machine or server, they can be fully used to greatly speed up HDL
. We have prepared a function HDL.rg.parallel
to make parallelism very simple.
You can run HDL.parallel.run.R
like below to use the parallel version of HDL
with two cores.
Rscript /Path/to/HDL/HDL.parallel.run.R \
gwas1.df=/Path/to/gwas1/gwas1.array.example.rds \
gwas2.df=/Path/to/gwas2/gwas2.array.example.rds \
LD.path=/Path/to/reference/UKB_array_SVD_eigen90_extraction \
output.file=/Path/to/output/test.Rout \
numCores=2
Comparing to a non-parallel HDL
run, there are only two changes in syntax:
- You should use
HDL.parallel.run.R
instead ofHDL.run.R
; - The number of cores to be used should be specified with the argument
numCores
.
HDL.rg.parallel
is the function to perform parallel HDL
. Same as the only change in the command line version, there is an extra argument numCores
to specify the number of cores to be used:
data(gwas1.example)
data(gwas2.example)
LD.path <- "/Path/to/reference/UKB_array_SVD_eigen90_extraction"
res.HDL <- HDL.rg.parallel(gwas1.example, gwas2.example, LD.path, numCores = 2)
res.HDL
You can run HDL.run.R
like below with only one GWAS to estimate heritability:
Rscript /Path/to/HDL/HDL.run.R \
gwas.df=/Path/to/gwas1/gwas1.array.example.rds \
LD.path=/Path/to/reference/UKB_array_SVD_eigen90_extraction \
output.file=/Path/to/output/test.Rout
The arguments are almost identical to those for estimating genetic correlation. The only difference is the use of gwas.df
instead of gwas1.df
and gwas2.df
. Please note that when you specify arguments, there should not be any space on any side of =
.
- Mandatory arguments
-
gwas.df
, the path to the file including the GWAS results for the trait. Most of the common file extensions are supported, including.gz
files. If a GWAS is not successfully loaded, it is recommended to transfer it to .txt or .rds; -
LD.path
, the path to the directory where linkage disequilibrium (LD) information is stored (i.e. where the reference panel located).
-
- Optional arguments
-
output.file
, where the log and results should be written. If you do not specify a file, the log will be printed in the console; -
Nref
, the sample size of the reference sample where LD is computed. If the default UK Biobank reference sample is used, Nref = 335265; -
eigen.cut
, which eigenvalues and eigenvectors in each LD score matrix should be used for HDL. Users are allowed to specify a numeric value between 0 and 1 for eigen.cut. For example, eigen.cut = 0.99 means using the leading eigenvalues explaining 99 and their correspondent eigenvectors. If the default 'automatic' is used, the eigen.cut gives the most stable heritability estimates will be used.
-
HDL.h2
is the function to estimate heritability with HDL
. The arguments for HDL.h2
are the same as the above arguments for the command-line implementation:
data(gwas1.example)
LD.path <- "/Path/to/reference/UKB_array_SVD_eigen90_extraction"
res.HDL <- HDL.h2(gwas.df = gwas1.example, LD.path = LD.path)
res.HDL
A list is returned with
-
h2
, the estimated heritability. -
h2.se
, the standard error of estimated heritability. -
P
, the Wald test P-value forh2
. -
eigen.use
, the eigen.cut used in computation.