diff --git a/INDEX.Rmd b/INDEX.Rmd index 2762c1c..e58f8e4 100644 --- a/INDEX.Rmd +++ b/INDEX.Rmd @@ -25,7 +25,7 @@ functions easily accessible from within R and allows for automatic evaluation of the results. **plinkQC** generates a per-individual and per-marker quality control report. -A step-by-step guide on how to run these analyses can be found [here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html). +A step-by-step guide on how to run these analyses can be found [here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html). Individuals and markers that fail the quality control can subsequently be removed with **plinkQC** to generate a new, clean dataset. diff --git a/doc/AncestryCheck.R b/doc/AncestryCheck.R index 267be26..1a18b40 100644 --- a/doc/AncestryCheck.R +++ b/doc/AncestryCheck.R @@ -1,4 +1,4 @@ -## ----setup knitr, include = FALSE---------------------------------------- +## ----setup knitr, include = FALSE--------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>" @@ -20,6 +20,6 @@ knitr::opts_chunk$set( # sep=""), # interactive=TRUE) -## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'---- +## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'------- knitr::include_graphics("checkAncestry.png") diff --git a/doc/AncestryCheck.Rmd b/doc/AncestryCheck.Rmd index 42fa7ff..9add4ee 100644 --- a/doc/AncestryCheck.Rmd +++ b/doc/AncestryCheck.Rmd @@ -41,8 +41,8 @@ is provided with $plinkQC$ in `file.path(find.package('plinkQC'),'extdata')`. ## Download reference data A suitable reference dataset should be downloaded and if necessary, re-formated into PLINK format. Vignettes -['Processing HapMap III reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/HapMap.html) and -['Processing 1000Genomes reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/1000Genomes.html), +['Processing HapMap III reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/HapMap.html) and +['Processing 1000Genomes reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/1000Genomes.html), show the download and processing of the HapMap phase III and 1000Genomes phase III dataset, respectively. In this example, we will use the HammapIII data as the reference dataset. diff --git a/doc/AncestryCheck.pdf b/doc/AncestryCheck.pdf index c62791b..9930aa5 100644 Binary files a/doc/AncestryCheck.pdf and b/doc/AncestryCheck.pdf differ diff --git a/doc/Genomes1000.R b/doc/Genomes1000.R index 08fd8db..2230d50 100644 --- a/doc/Genomes1000.R +++ b/doc/Genomes1000.R @@ -1,4 +1,4 @@ -## ----setup knitr, include = FALSE---------------------------------------- +## ----setup knitr, include = FALSE--------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>" diff --git a/doc/Genomes1000.Rmd b/doc/Genomes1000.Rmd index 49f9891..19db9a4 100644 --- a/doc/Genomes1000.Rmd +++ b/doc/Genomes1000.Rmd @@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the 1000 Genomes study [@a1000Genomes2015],[@b1000Genomes2015] as a reference dataset. Using the 1000 Genomes reference, population structure down to large-scale continental ancestry can be detected. A step-by-step instruction on -how to conduct this ancestry analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html). +how to conduct this ancestry analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html). # Workflow diff --git a/doc/Genomes1000.pdf b/doc/Genomes1000.pdf index 7c2efdc..02e9e49 100644 Binary files a/doc/Genomes1000.pdf and b/doc/Genomes1000.pdf differ diff --git a/doc/HapMap.R b/doc/HapMap.R index 08fd8db..2230d50 100644 --- a/doc/HapMap.R +++ b/doc/HapMap.R @@ -1,4 +1,4 @@ -## ----setup knitr, include = FALSE---------------------------------------- +## ----setup knitr, include = FALSE--------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>" diff --git a/doc/HapMap.Rmd b/doc/HapMap.Rmd index 10fa4b3..938870c 100644 --- a/doc/HapMap.Rmd +++ b/doc/HapMap.Rmd @@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the HapMap study [@HapMap2005][@HapMap2007][@HapMap2010] as a reference dataset. Using this reference, population structure down to large-scale continental ancestry can be detected. A step-by-step instruction on how to conduct this -analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html). +analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html). # Workflow @@ -106,12 +106,22 @@ https://genome.ucsc.edu/cgi-bin/hgLiftOver and the appropriate liftover chain fr zero-based [UCSC bed](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) format. +Hapmap chromosome data is encoded numerically, with chrX represented by chr23, +and chrY as chr24. In order to match to data encoded by chrX and chrY, we will +have to rename these hapmap chromosomes. Converting to zero-based UCSC format +and re-coding chromosome codes can be achieved by: + ```{bash prepare liftover, eval=FALSE} awk '{print "chr" $1, $4 -1, $4, $2 }' $refdir/HapMapIII_NCBI36.bim | \ sed 's/chr23/chrX/' | sed 's/chr24/chrY/' > \ $refdir/HapMapIII_NCBI36.tolift ``` +[Note: In the official HapMap release, chromosome codes described above, however +in the orignal download files (link above), no chr24 detected. I will keep this +line in for completeness, but note, when inspecting file that no chr24/chrY are +present.] + We use the liftOver tool and the UCSC bed formated annotation file together with the appropriate chain file to do the lift over. @@ -133,7 +143,7 @@ awk '{print $4, $3}' $refdir/HapMapIII_CGRCh37 > $refdir/HapMapIII_CGRCh37.pos ## Update the reference data We can now use PLINK to extract the mappable variants from the old build and update their position. After these steps, the HapMap III dataset can be used -for infering study ancestry as described in the corresponding [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html). +for infering study ancestry as described in the corresponding [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html). ```{bash update annotation, eval=FALSE} plink --bfile $refdir/HapMapIII_NCBI36 \ --extract $refdir/HapMapIII_CGRCh37.snps \ diff --git a/doc/HapMap.pdf b/doc/HapMap.pdf index ddd11f7..8dccd45 100644 Binary files a/doc/HapMap.pdf and b/doc/HapMap.pdf differ diff --git a/doc/plinkQC.R b/doc/plinkQC.R index 213487b..e3d72b5 100644 --- a/doc/plinkQC.R +++ b/doc/plinkQC.R @@ -1,22 +1,22 @@ -## ----setup, include = FALSE---------------------------------------------- +## ----setup, include = FALSE--------------------------------------------------- library(plinkQC) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) -## ----set parameters------------------------------------------------------ +## ----set parameters----------------------------------------------------------- package.dir <- find.package('plinkQC') indir <- file.path(package.dir, 'extdata') qcdir <- tempdir() name <- 'data' path2plink <- "/Users/hannah/bin/plink" -## ----copy files---------------------------------------------------------- +## ----copy files--------------------------------------------------------------- system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'), qcdir)) -## ----individual QC, eval=FALSE, fig.height=12, fig.width=9-------------- +## ----individual QC, eval=FALSE, fig.height=12, fig.width=9------------------- # fail_individuals <- perIndividualQC(indir=indir, qcdir=qcdir, name=name, # refSamplesFile=paste(indir, "/HapMap_ID2Pop.txt", # sep=""), @@ -31,41 +31,40 @@ system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'), par(mfrow=c(2,1), las=1) knitr::include_graphics("individualQC.png") -## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE-------- +## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE------------- # overview_individuals <- overviewPerIndividualQC(fail_individuals, # interactive=TRUE) -## ----load overviewIndividualQC, out.width = "500px", echo=FALSE---------- +## ----load overviewIndividualQC, out.width = "500px", echo=FALSE--------------- par(mfrow=c(2,1), las=1) knitr::include_graphics("overviewQC.png") -knitr::include_graphics("overviewAncestryQC.png") -## ----marker QC, eval=FALSE----------------------------------------------- +## ----marker QC, eval=FALSE---------------------------------------------------- # fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name, # path2plink=path2plink, # verbose=TRUE, interactive=TRUE, # showPlinkOutput=FALSE) -## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'---- +## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'------- par(mfrow=c(2,1), las=1) knitr::include_graphics("markerQC.png") -## ----overview marker QC, eval=FALSE-------------------------------------- +## ----overview marker QC, eval=FALSE------------------------------------------- # overview_marker <- overviewPerMarkerQC(fail_markers, interactive=TRUE) -## ----load overviewMarkerQC, out.width = "500px", echo=FALSE-------------- +## ----load overviewMarkerQC, out.width = "500px", echo=FALSE------------------- par(mfrow=c(2,1), las=1) knitr::include_graphics("overviewMarkerQC.png") -## ----clean data, eval=FALSE---------------------------------------------- +## ----clean data, eval=FALSE--------------------------------------------------- # Ids <- cleanData(indir=indir, qcdir=qcdir, name=name, path2plink=path2plink, # verbose=TRUE, showPlinkOutput=FALSE) -## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'------ +## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'----------- # fail_sex <- check_sex(indir=indir, qcdir=qcdir, name=name, interactive=TRUE, # verbose=TRUE, path2plink=path2plink) -## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'---- +## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'------- knitr::include_graphics("checkSex.png") ## ----check het miss, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'---- @@ -93,10 +92,10 @@ knitr::include_graphics("checkRelatedness.png") # path2plink=path2plink, run.check_ancestry = FALSE, # interactive=TRUE) -## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'---- +## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'------- knitr::include_graphics("checkAncestry.png") -## ----check snp missing, eval=FALSE--------------------------------------- +## ----check snp missing, eval=FALSE-------------------------------------------- # fail_snpmissing <- check_snp_missingness(indir=indir, qcdir=qcdir, name=name, # interactive=TRUE, # path2plink=path2plink, @@ -105,17 +104,17 @@ knitr::include_graphics("checkAncestry.png") ## ----load snp missing, out.width = "500px", echo=FALSE, fig.align='center'---- knitr::include_graphics("snpmissingness.png") -## ----check hwe, eval=FALSE----------------------------------------------- +## ----check hwe, eval=FALSE---------------------------------------------------- # fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=TRUE, # path2plink=path2plink, showPlinkOutput=FALSE) -## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'------- +## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'------------ knitr::include_graphics("hwe.png") -## ----check maf, eval=FALSE----------------------------------------------- +## ----check maf, eval=FALSE---------------------------------------------------- # fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, interactive=TRUE, # path2plink=path2plink, showPlinkOutput=FALSE) -## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'------ +## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'----------- knitr::include_graphics("maf.png") diff --git a/doc/plinkQC.Rmd b/doc/plinkQC.Rmd index 4c9f586..8d065b8 100644 --- a/doc/plinkQC.Rmd +++ b/doc/plinkQC.Rmd @@ -115,6 +115,10 @@ individual IDs to the qcdir. These IDs will be removed in the computation of the `perMarkerQC`. If the list is not present, `perMarkerQC` will send a message about conducting the quality control on the entire dataset. +NB: To reduce the data size of the example data in `plinkQC`, +data.genome has already been reduced to the individuals that are related. Thus +the relatedness plots in C only show counts for related individuals only. + NB: To demonstrate the results of the ancestry check, the required eigenvector file of the combined study and reference datasets have been precomputed and for the purpose of this example will be copied to the `qcdir`. In practice, @@ -156,7 +160,6 @@ overview_individuals <- overviewPerIndividualQC(fail_individuals, ```{r load overviewIndividualQC, out.width = "500px", echo=FALSE} par(mfrow=c(2,1), las=1) knitr::include_graphics("overviewQC.png") -knitr::include_graphics("overviewAncestryQC.png") ``` @@ -273,6 +276,10 @@ complex family structures, the unrelated individuals per family are selected (e.g. in a parents-offspring trio, the offspring will be marked as fail, while the parents will be kept in the analysis). +NB: To reduce the data size of the example data in `plinkQC`, +data.genome has already been reduced to the individuals that are related. Thus +the relatedness plots in C only show counts for related individuals only. + ```{r check related, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'} exclude_relatedness <- check_relatedness(indir=indir, qcdir=qcdir, name=name, interactive=TRUE, diff --git a/doc/plinkQC.pdf b/doc/plinkQC.pdf index 8be0f3a..c645090 100644 Binary files a/doc/plinkQC.pdf and b/doc/plinkQC.pdf differ diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html index 9414e4f..f21c027 100644 --- a/docs/CODE_OF_CONDUCT.html +++ b/docs/CODE_OF_CONDUCT.html @@ -69,7 +69,7 @@ plinkQC - 0.2.3 + 0.3.0 diff --git a/docs/articles/AncestryCheck.html b/docs/articles/AncestryCheck.html index 692084f..862ef00 100644 --- a/docs/articles/AncestryCheck.html +++ b/docs/articles/AncestryCheck.html @@ -106,7 +106,7 @@

Ancestry estimation based on reference samples of known ethnicities

Hannah Meyer

-

2019-10-18

+

2020-03-11

@@ -127,7 +127,7 @@

Download reference data

-

A suitable reference dataset should be downloaded and if necessary, re-formated into PLINK format. Vignettes ‘Processing HapMap III reference data for ancestry estimation’ and ‘Processing 1000Genomes reference data for ancestry estimation’, show the download and processing of the HapMap phase III and 1000Genomes phase III dataset, respectively. In this example, we will use the HammapIII data as the reference dataset.

+

A suitable reference dataset should be downloaded and if necessary, re-formated into PLINK format. Vignettes ‘Processing HapMap III reference data for ancestry estimation’ and ‘Processing 1000Genomes reference data for ancestry estimation’, show the download and processing of the HapMap phase III and 1000Genomes phase III dataset, respectively. In this example, we will use the HammapIII data as the reference dataset.

diff --git a/docs/articles/Genomes1000.html b/docs/articles/Genomes1000.html index 582775b..995cb98 100644 --- a/docs/articles/Genomes1000.html +++ b/docs/articles/Genomes1000.html @@ -106,7 +106,7 @@

Processing 1000 Genomes reference data for ancestry estimation

Hannah Meyer

-

2019-10-18

+

2020-03-11

@@ -119,7 +119,7 @@

2019-10-18

Introduction

Genotype quality control for genetic association studies often includes the need for selecting samples of the same ethnic background. To identify individuals of divergent ancestry based on genotypes, the genotypes of the study population can be combined with genotypes of a reference dataset consisting of individuals from known ethnicities. Principal component analysis (PCA) on this combined genotype panel can then be used to detect population structure down to the level of the reference dataset.

-

The following vignette shows the processing steps required to use samples of the 1000 Genomes study [1],[2] as a reference dataset. Using the 1000 Genomes reference, population structure down to large-scale continental ancestry can be detected. A step-by-step instruction on how to conduct this ancestry analysis is described in this vignette.

+

The following vignette shows the processing steps required to use samples of the 1000 Genomes study [1],[2] as a reference dataset. Using the 1000 Genomes reference, population structure down to large-scale continental ancestry can be detected. A step-by-step instruction on how to conduct this ancestry analysis is described in this vignette.

diff --git a/docs/articles/HapMap.html b/docs/articles/HapMap.html index 17e24f6..c1f15f9 100644 --- a/docs/articles/HapMap.html +++ b/docs/articles/HapMap.html @@ -106,7 +106,7 @@

Processing HapMap III reference data for ancestry estimation

Hannah Meyer

-

2019-10-18

+

2020-03-11

@@ -119,7 +119,7 @@

2019-10-18

Introduction

Genotype quality control for genetic association studies often includes the need for selecting samples of the same ethnic background. To identify individuals of divergent ancestry based on genotypes, the genotypes of the study population can be combined with genotypes of a reference dataset consisting of individuals from known ethnicities. Principal component analysis (PCA) on this combined genotype panel can then be used to detect population structure down to the level of the reference dataset.

-

The following vignette shows the processing steps required to use samples of the HapMap study [1][2][3] as a reference dataset. Using this reference, population structure down to large-scale continental ancestry can be detected. A step-by-step instruction on how to conduct this analysis is described in this vignette.

+

The following vignette shows the processing steps required to use samples of the HapMap study [1][2][3] as a reference dataset. Using this reference, population structure down to large-scale continental ancestry can be detected. A step-by-step instruction on how to conduct this analysis is described in this vignette.

@@ -159,9 +159,11 @@

The genome build of HapMap III data is NCBI36. Currently most datasets are updated to CGRCh37 or CGRCh38. In order to update the HapMap III data to the desired build, we use the UCSC liftOver tool. The liftOver tool takes information in a format similar to the PLINK .bim format, the UCSC bed format and a liftover chain, containing the mapping information between the old genome (target) and new genome (query). It returns the updated annotation (newFile) and a file with unmappable variants (unMapped):

We first need to download the liftOver tool from https://genome.ucsc.edu/cgi-bin/hgLiftOver and the appropriate liftover chain from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/). We then convert the PLINK .bim format, to the zero-based UCSC bed format.

+

Hapmap chromosome data is encoded numerically, with chrX represented by chr23, and chrY as chr24. In order to match to data encoded by chrX and chrY, we will have to rename these hapmap chromosomes. Converting to zero-based UCSC format and re-coding chromosome codes can be achieved by:

+

[Note: In the official HapMap release, chromosome codes described above, however in the orignal download files (link above), no chr24 detected. I will keep this line in for completeness, but note, when inspecting file that no chr24/chrY are present.]

We use the liftOver tool and the UCSC bed formated annotation file together with the appropriate chain file to do the lift over.

@@ -174,7 +176,7 @@

Update the reference data

-

We can now use PLINK to extract the mappable variants from the old build and update their position. After these steps, the HapMap III dataset can be used for infering study ancestry as described in the corresponding vignette.

+

We can now use PLINK to extract the mappable variants from the old build and update their position. After these steps, the HapMap III dataset can be used for infering study ancestry as described in the corresponding vignette.

plink --bfile $refdir/HapMapIII_NCBI36 \
     --extract $refdir/HapMapIII_CGRCh37.snps \
     --update-map $refdir/HapMapIII_CGRCh37.pos \
diff --git a/docs/articles/plinkQC.html b/docs/articles/plinkQC.html
index 5ef2e99..947feb4 100644
--- a/docs/articles/plinkQC.html
+++ b/docs/articles/plinkQC.html
@@ -106,10 +106,8 @@
       

Genotype quality control with plinkQC

Hannah Meyer

- -

2019-10-18

+

2020-03-11

- @@ -187,7 +185,7 @@

overviewPerMarkerQC depicts an overview of the marker quality control failures and their overlaps.

overview_marker <- overviewPerMarkerQC(fail_markers, interactive=TRUE)

- +

diff --git a/docs/index.html b/docs/index.html index 8a08642..a1dc5a4 100644 --- a/docs/index.html +++ b/docs/index.html @@ -121,7 +121,7 @@

plinkQC

plinkQC is a R/CRAN package for genotype quality control in genetic association studies. It makes PLINK basic statistics (e.g.missing genotyping rates per individual, allele frequencies per genetic marker) and relationship functions easily accessible from within R and allows for automatic evaluation of the results.

-

plinkQC generates a per-individual and per-marker quality control report. A step-by-step guide on how to run these analyses can be found here.

+

plinkQC generates a per-individual and per-marker quality control report. A step-by-step guide on how to run these analyses can be found here.

Individuals and markers that fail the quality control can subsequently be removed with plinkQC to generate a new, clean dataset.

plinkQC facilitates an ancestry check for study individuals based on comparison to reference datasets. The processing of the reference datasets is documented in detail here.

Removal of individuals based on relationship status via plinkQC is optimised to retain as many individuals as possible in the study.

@@ -132,10 +132,8 @@

Installation

The current github version of plinkQC is: 0.3.0 and can be installed via

library(devtools)
-
 install_github("meyer-lab-cshl/plinkQC")

The current CRAN version of plinkQC is: 0.2.3 and can be installed via

-
install.packages("plinkQC")

A log of version changes can be found here.

@@ -184,10 +182,7 @@

Dev status

  • CRAN_Status_Badge
  • Build Status
  • License: MIT
  • -
  • Downloads
  • - -

    diff --git a/docs/news/index.html b/docs/news/index.html index 6e9b5e2..385381d 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -146,7 +146,7 @@

    Changelog

    -plinkQC 0.3.0 Unreleased +plinkQC 0.3.0 2019-10-19

    @@ -186,11 +186,9 @@

  • Include check in case all samples fail perIndividual QC in 894acc1fa03dadfe0ad2028888142171bcc641eb and 04642246d18ed4eaac5b9d9a6931d1ecb08308e8)
  • Include checks for diagonal derived relationship estimates, and estimate data containing only related individuals; fixes #11
  • Fix command for genotype conversion in 1000Genomes vignette, addressing issue - #10
  • fix missing rownames error for overviewPerIndividualQC, when relatedness check was included (issue #16, fc7a38b1f2b345d9c6c5d69f5dcf0bc57a857f62)
  • fix vignette mismatch (issue #16, 09dcd59e77178b35747aae81a5c1988712e20de9)
  • -

    diff --git a/vignettes/AncestryCheck.Rmd b/vignettes/AncestryCheck.Rmd index 42fa7ff..9add4ee 100644 --- a/vignettes/AncestryCheck.Rmd +++ b/vignettes/AncestryCheck.Rmd @@ -41,8 +41,8 @@ is provided with $plinkQC$ in `file.path(find.package('plinkQC'),'extdata')`. ## Download reference data A suitable reference dataset should be downloaded and if necessary, re-formated into PLINK format. Vignettes -['Processing HapMap III reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/HapMap.html) and -['Processing 1000Genomes reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/1000Genomes.html), +['Processing HapMap III reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/HapMap.html) and +['Processing 1000Genomes reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/1000Genomes.html), show the download and processing of the HapMap phase III and 1000Genomes phase III dataset, respectively. In this example, we will use the HammapIII data as the reference dataset.