Skip to content

Commit

Permalink
Update links and build package
Browse files Browse the repository at this point in the history
  • Loading branch information
HannahVMeyer committed Mar 12, 2020
1 parent e8afbb9 commit da987d8
Show file tree
Hide file tree
Showing 21 changed files with 63 additions and 54 deletions.
2 changes: 1 addition & 1 deletion INDEX.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ functions easily accessible from within R and allows for automatic evaluation
of the results.

**plinkQC** generates a per-individual and per-marker quality control report.
A step-by-step guide on how to run these analyses can be found [here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html).
A step-by-step guide on how to run these analyses can be found [here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).

Individuals and markers that fail the quality control can subsequently be
removed with **plinkQC** to generate a new, clean dataset.
Expand Down
4 changes: 2 additions & 2 deletions doc/AncestryCheck.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
Expand All @@ -20,6 +20,6 @@ knitr::opts_chunk$set(
# sep=""),
# interactive=TRUE)

## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkAncestry.png")

4 changes: 2 additions & 2 deletions doc/AncestryCheck.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ is provided with $plinkQC$ in `file.path(find.package('plinkQC'),'extdata')`.
## Download reference data
A suitable reference dataset should be downloaded and if necessary, re-formated
into PLINK format. Vignettes
['Processing HapMap III reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/HapMap.html) and
['Processing 1000Genomes reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/1000Genomes.html),
['Processing HapMap III reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/HapMap.html) and
['Processing 1000Genomes reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/1000Genomes.html),
show the download and processing of the HapMap phase III and 1000Genomes phase
III dataset, respectively. In this example, we will use the HammapIII data as
the reference dataset.
Expand Down
Binary file modified doc/AncestryCheck.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion doc/Genomes1000.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
Expand Down
2 changes: 1 addition & 1 deletion doc/Genomes1000.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
1000 Genomes study [@a1000Genomes2015],[@b1000Genomes2015] as a reference
dataset. Using the 1000 Genomes reference, population structure down to
large-scale continental ancestry can be detected. A step-by-step instruction on
how to conduct this ancestry analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
how to conduct this ancestry analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).


# Workflow
Expand Down
Binary file modified doc/Genomes1000.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion doc/HapMap.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
Expand Down
14 changes: 12 additions & 2 deletions doc/HapMap.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
HapMap study [@HapMap2005][@HapMap2007][@HapMap2010] as a reference dataset.
Using this reference, population structure down to large-scale continental
ancestry can be detected. A step-by-step instruction on how to conduct this
analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).


# Workflow
Expand Down Expand Up @@ -106,12 +106,22 @@ https://genome.ucsc.edu/cgi-bin/hgLiftOver and the appropriate liftover chain fr
zero-based [UCSC bed](https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
format.

Hapmap chromosome data is encoded numerically, with chrX represented by chr23,
and chrY as chr24. In order to match to data encoded by chrX and chrY, we will
have to rename these hapmap chromosomes. Converting to zero-based UCSC format
and re-coding chromosome codes can be achieved by:

```{bash prepare liftover, eval=FALSE}
awk '{print "chr" $1, $4 -1, $4, $2 }' $refdir/HapMapIII_NCBI36.bim | \
sed 's/chr23/chrX/' | sed 's/chr24/chrY/' > \
$refdir/HapMapIII_NCBI36.tolift
```

[Note: In the official HapMap release, chromosome codes described above, however
in the orignal download files (link above), no chr24 detected. I will keep this
line in for completeness, but note, when inspecting file that no chr24/chrY are
present.]

We use the liftOver tool and the UCSC bed formated annotation file together
with the appropriate chain file to do the lift over.

Expand All @@ -133,7 +143,7 @@ awk '{print $4, $3}' $refdir/HapMapIII_CGRCh37 > $refdir/HapMapIII_CGRCh37.pos
## Update the reference data
We can now use PLINK to extract the mappable variants from the old build and
update their position. After these steps, the HapMap III dataset can be used
for infering study ancestry as described in the corresponding [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
for infering study ancestry as described in the corresponding [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
```{bash update annotation, eval=FALSE}
plink --bfile $refdir/HapMapIII_NCBI36 \
--extract $refdir/HapMapIII_CGRCh37.snps \
Expand Down
Binary file modified doc/HapMap.pdf
Binary file not shown.
39 changes: 19 additions & 20 deletions doc/plinkQC.R
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
## ----setup, include = FALSE----------------------------------------------
## ----setup, include = FALSE---------------------------------------------------
library(plinkQC)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)

## ----set parameters------------------------------------------------------
## ----set parameters-----------------------------------------------------------
package.dir <- find.package('plinkQC')
indir <- file.path(package.dir, 'extdata')
qcdir <- tempdir()
name <- 'data'
path2plink <- "/Users/hannah/bin/plink"

## ----copy files----------------------------------------------------------
## ----copy files---------------------------------------------------------------
system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
qcdir))

## ----individual QC, eval=FALSE, fig.height=12, fig.width=9--------------
## ----individual QC, eval=FALSE, fig.height=12, fig.width=9-------------------
# fail_individuals <- perIndividualQC(indir=indir, qcdir=qcdir, name=name,
# refSamplesFile=paste(indir, "/HapMap_ID2Pop.txt",
# sep=""),
Expand All @@ -31,41 +31,40 @@ system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
par(mfrow=c(2,1), las=1)
knitr::include_graphics("individualQC.png")

## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE--------
## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE-------------
# overview_individuals <- overviewPerIndividualQC(fail_individuals,
# interactive=TRUE)

## ----load overviewIndividualQC, out.width = "500px", echo=FALSE----------
## ----load overviewIndividualQC, out.width = "500px", echo=FALSE---------------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewQC.png")
knitr::include_graphics("overviewAncestryQC.png")

## ----marker QC, eval=FALSE-----------------------------------------------
## ----marker QC, eval=FALSE----------------------------------------------------
# fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
# path2plink=path2plink,
# verbose=TRUE, interactive=TRUE,
# showPlinkOutput=FALSE)

## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'----
## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'-------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("markerQC.png")

## ----overview marker QC, eval=FALSE--------------------------------------
## ----overview marker QC, eval=FALSE-------------------------------------------
# overview_marker <- overviewPerMarkerQC(fail_markers, interactive=TRUE)

## ----load overviewMarkerQC, out.width = "500px", echo=FALSE--------------
## ----load overviewMarkerQC, out.width = "500px", echo=FALSE-------------------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewMarkerQC.png")

## ----clean data, eval=FALSE----------------------------------------------
## ----clean data, eval=FALSE---------------------------------------------------
# Ids <- cleanData(indir=indir, qcdir=qcdir, name=name, path2plink=path2plink,
# verbose=TRUE, showPlinkOutput=FALSE)

## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'------
## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'-----------
# fail_sex <- check_sex(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# verbose=TRUE, path2plink=path2plink)

## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkSex.png")

## ----check het miss, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'----
Expand Down Expand Up @@ -93,10 +92,10 @@ knitr::include_graphics("checkRelatedness.png")
# path2plink=path2plink, run.check_ancestry = FALSE,
# interactive=TRUE)

## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkAncestry.png")

## ----check snp missing, eval=FALSE---------------------------------------
## ----check snp missing, eval=FALSE--------------------------------------------
# fail_snpmissing <- check_snp_missingness(indir=indir, qcdir=qcdir, name=name,
# interactive=TRUE,
# path2plink=path2plink,
Expand All @@ -105,17 +104,17 @@ knitr::include_graphics("checkAncestry.png")
## ----load snp missing, out.width = "500px", echo=FALSE, fig.align='center'----
knitr::include_graphics("snpmissingness.png")

## ----check hwe, eval=FALSE-----------------------------------------------
## ----check hwe, eval=FALSE----------------------------------------------------
# fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# path2plink=path2plink, showPlinkOutput=FALSE)

## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'-------
## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'------------
knitr::include_graphics("hwe.png")

## ----check maf, eval=FALSE-----------------------------------------------
## ----check maf, eval=FALSE----------------------------------------------------
# fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# path2plink=path2plink, showPlinkOutput=FALSE)

## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'------
## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'-----------
knitr::include_graphics("maf.png")

9 changes: 8 additions & 1 deletion doc/plinkQC.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ individual IDs to the qcdir. These IDs will be removed in the computation of the
`perMarkerQC`. If the list is not present, `perMarkerQC` will send a message
about conducting the quality control on the entire dataset.

NB: To reduce the data size of the example data in `plinkQC`,
data.genome has already been reduced to the individuals that are related. Thus
the relatedness plots in C only show counts for related individuals only.

NB: To demonstrate the results of the ancestry check, the required eigenvector
file of the combined study and reference datasets have been precomputed and
for the purpose of this example will be copied to the `qcdir`. In practice,
Expand Down Expand Up @@ -156,7 +160,6 @@ overview_individuals <- overviewPerIndividualQC(fail_individuals,
```{r load overviewIndividualQC, out.width = "500px", echo=FALSE}
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewQC.png")
knitr::include_graphics("overviewAncestryQC.png")
```


Expand Down Expand Up @@ -273,6 +276,10 @@ complex family structures, the unrelated individuals per family are selected
(e.g. in a parents-offspring trio, the offspring will be marked as fail, while
the parents will be kept in the analysis).

NB: To reduce the data size of the example data in `plinkQC`,
data.genome has already been reduced to the individuals that are related. Thus
the relatedness plots in C only show counts for related individuals only.

```{r check related, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'}
exclude_relatedness <- check_relatedness(indir=indir, qcdir=qcdir, name=name,
interactive=TRUE,
Expand Down
Binary file modified doc/plinkQC.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/CODE_OF_CONDUCT.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/articles/AncestryCheck.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/articles/Genomes1000.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit da987d8

Please sign in to comment.