-
Notifications
You must be signed in to change notification settings - Fork 28
Format of summary statistics
The extension of the input data file can be either .rds or .txt.
It should include following columns: SNP
, SNP ID; A1
, effect allele; A2
, reference allele; N
, sample size; Z
, z-score;
If Z
is not available, alternatively, you may provide: b
, estimate of marginal effect in GWAS; and se
,
standard error of the estimates of marginal effects in GWAS.
The summary statistics should look like this (b
and se
can be absent since Z
is available):
## SNP A1 A2 N b se Z
## 1 rs3131962 G A 205475 0.001004 0.004590 0.2187
## 2 rs12562034 A G 205475 0.005382 0.005011 1.0740
## 3 rs11240779 A G 205475 0.002259 0.003691 0.6119
## 4 rs57181708 G A 205475 0.005401 0.005114 1.0562
## 5 rs4422948 G A 205475 0.005368 0.003604 1.4893
## 6 rs4970383 A C 205475 0.004685 0.003582 1.3080
If you are clear about how to transform the GWAS into the above format, you can do it yourself.
However, we have prepared an R script HDL.data.wrangling.R
to make the transformation easier. Its function is to (i) extract the overlapped SNPs between your GWAS and HDL reference panel, and (ii) extract the columns that HDL needs. Depending on the source of the GWAS, you may use HDL.data.wrangling.R
in two ways:
HDL.data.wrangling.R
has some built-in functions for transforming GWASs from typical analyses and consortia. However, the current version supports only the Neale Lab round 2 GWAS of UK Biobank. The performance of HDL.data.wrangling.R
for other typical GWASs is still under testing.
Here we take the GWAS results for birth weight as an example (file name: 20002_1223.gwas.imputed_v3.both_sexes.tsv
, see this page about downloading the data). You can run HDL.data.wrangling.R
like below
Rscript HDL.data.wrangling.R \
gwas.path=/Path/to/gwas/20002_1223.gwas.imputed_v3.both_sexes.tsv \
LD.path=/Path/to/reference/UKB_imputed_SVD_eigen99_extraction \
GWAS.type=UKB.Neale
output.file=/Path/to/output/test
In this case, there are several arguments you should pass to HDL.data.wrangling.R
. Please note that when you specify arguments, there should not be any space on any side of =
.
- Mandatory arguments
-
gwas.path
, the path to the downloaded GWAS results; -
LD.path
, the path to the directory where linkage disequilibrium (LD) information is stored (i.e. where the reference panel located); -
GWAS.type
, which analysis/consortium the GWAS is from. Here are the values for the analyses and consortia supported by the current version ofHDL.data.wrangling.R
:-
UKB.Neale
: The Neale Lab round 2 GWAS of UK Biobank
-
-
- Optional arguments
-
output.file
, the path and file name where the transformed data should be saved. If specified, the transformed data will be saved asoutput.file.hdl.rds
. If not specified, the transformed data will be saved asgwas.path.hdl.rds
.
-
During data wrangling, for the above example, these messages will be printed:
Program starts on Tue Feb 25 12:23:39 2020
Loading GWAS summary statistics from /Path/to/gwas/20002_1223.gwas.imputed_v3.both_sexes.tsv
Data is loaded successfully. Data wrangling starts.
Data wrangling completed.
1029876 out of 1029876 (100%) SNPs in reference panel are available in GWAS.
The output is saved to /Path/to/output/test.hdl.rds
As the last line of the above message suggests, the transformed GWAS is saved as /Path/to/output/test.hdl.rds
and is ready as input for HDL
.
In this case, instead of specifying GWAS.type
, you need to explicitly tell HDL.data.wrangling.R
how to understand the variable names in the GWAS. Other than this, the syntax is the same as that in the previous section. For example, if your GWAS looks like this:
## rsid alt ref tstat n_complete_samples beta se
## 1 rs3131962 G A 0.2187 205475 0.001004 0.004590
## 2 rs12562034 A G 1.0740 205475 0.005382 0.005011
## 3 rs11240779 A G 0.6119 205475 0.002259 0.003691
## 4 rs57181708 G A 1.0562 205475 0.005401 0.005114
## 5 rs4422948 G A 1.4893 205475 0.005368 0.003604
## 6 rs4970383 A C 1.3080 205475 0.004685 0.003582
You should use the below command to run HDL.data.wrangling.R
Rscript HDL.data.wrangling.R \
gwas.path=/Path/to/gwas/your.gwas.txt \
LD.path=/Path/to/reference/UKB_imputed_SVD_eigen99_extraction \
SNP=rsid A1=alt A2=ref N=n_complete_samples Z=tstat \
output.file=/Path/to/output/test
or
Rscript HDL.data.wrangling.R \
gwas.path=/Path/to/gwas/your.gwas.txt \
LD.path=/Path/to/reference/UKB_imputed_SVD_eigen99_extraction \
SNP=rsid A1=alt A2=ref N=n_complete_samples b=beta se=se \
output.file=/Path/to/output/test