-
Notifications
You must be signed in to change notification settings - Fork 28
Build a reference panel
Lan Ao edited this page Mar 21, 2024
·
4 revisions
OS
Linux
or OSX
Download the repository
git clone https://github.com/zhenin/HDL.git
Compile Fortran functions
cd HDL
rm -f build_ld_ref/utils/bmult.o build_ld_ref/utils/ldscore.o \
build_ld_ref/utils/bmult.so build_ld_ref/utils/ldscore.so
R CMD SHLIB build_ld_ref/utils/bmult.f90
R CMD SHLIB build_ld_ref/utils/ldscore.f90
Install R packages
install.packages(c('tidyr', 'dplyr', 'data.table', 'RSpectra', 'argparser'))
Install HDL
(required for demo example)
Rscript HDL.install.R
Plink files of the demo example are generated from the data of 1000 Genomes Project.
bash build_ld_ref/run_demo.sh
CAUTION :
-
.bim
files of ALL chromosomes, of the LD reference panel, must be merged (cat
) into a SINGLE.bim
file. - Variant identifiers (rsids) in the
.bim
file MUST BE UNIQUE.
Rscript build_ld_ref/1_split_chroms.R <ld_ref_path/ld_ref_name> <ALL_SNPS.bim> --min MIN_AVG_NUM_SNPs --max MAX_AVG_NUM_SNPs
--min
and --max
options control the range of average number of variants in a segment.
Prepare plink data: bfile.bed + bfile.bim + bfile.fam
.
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> [bandwidth [ld_window [chroms]]] | bash
Optional arguments:
bandwidth
: bandwidth (number of SNPs) for LD calculation, default=500
.
ld_window
: window size (kb) for LD calculation, default=1000000
(whole segment).
chroms
: selected chromosomes, separated by comma (,
), default=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
.
- Or run parallelly using
parallel
command
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> | parallel -j n_cores
- Or run parallelly by saving commands to a file, then splitting & submitting it to your server cluster accordingly
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> > jobs.sh
bash build_ld_ref/3_build_ld_ref.sh <ld_ref_path/ld_ref_name> | bash
Or run parallelly as Step 2.