Skip to content

Build a reference panel

Lan Ao edited this page Mar 21, 2024 · 4 revisions

Requirements

OS

Linux or OSX

Download the repository

git clone https://github.com/zhenin/HDL.git

Compile Fortran functions

cd HDL
rm -f build_ld_ref/utils/bmult.o build_ld_ref/utils/ldscore.o \
  build_ld_ref/utils/bmult.so build_ld_ref/utils/ldscore.so
R CMD SHLIB build_ld_ref/utils/bmult.f90
R CMD SHLIB build_ld_ref/utils/ldscore.f90

Install R packages

install.packages(c('tidyr', 'dplyr', 'data.table', 'RSpectra', 'argparser'))

Install HDL (required for demo example)

Rscript HDL.install.R

Guide

Demo

Plink files of the demo example are generated from the data of 1000 Genomes Project.

bash build_ld_ref/run_demo.sh

Step 1. Split chromosomes

CAUTION :

  • .bim files of ALL chromosomes, of the LD reference panel, must be merged (cat) into a SINGLE .bim file.
  • Variant identifiers (rsids) in the .bim file MUST BE UNIQUE.
Rscript build_ld_ref/1_split_chroms.R <ld_ref_path/ld_ref_name> <ALL_SNPS.bim> --min MIN_AVG_NUM_SNPs --max MAX_AVG_NUM_SNPs

--min and --max options control the range of average number of variants in a segment.

Step 2. Calculate LD

Prepare plink data: bfile.bed + bfile.bim + bfile.fam.

bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> [bandwidth [ld_window [chroms]]] | bash

Optional arguments:

bandwidth: bandwidth (number of SNPs) for LD calculation, default=500.

ld_window: window size (kb) for LD calculation, default=1000000 (whole segment).

chroms: selected chromosomes, separated by comma (,), default=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22.

  • Or run parallelly using parallel command
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> | parallel -j n_cores
  • Or run parallelly by saving commands to a file, then splitting & submitting it to your server cluster accordingly
bash build_ld_ref/2_cal_ld.sh <path/to/bfile> <ld_ref_path/ld_ref_name> > jobs.sh

Step 3. Build LD reference

bash build_ld_ref/3_build_ld_ref.sh <ld_ref_path/ld_ref_name> | bash

Or run parallelly as Step 2.