Skip to content

Commit

Permalink
Merge pull request #22 from meyer-lab-cshl/dev
Browse files Browse the repository at this point in the history
Note in hapmap vignette, update broken links, and version number
  • Loading branch information
HannahVMeyer authored Mar 13, 2020
2 parents 534ec7b + be6b5c9 commit 64f898e
Show file tree
Hide file tree
Showing 58 changed files with 159 additions and 118 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: plinkQC
Type: Package
Title: Genotype Quality Control with 'PLINK'
Version: 0.3.0
Version: 0.3.1
Authors@R:
person("Hannah", "Meyer", email = "hannah.v.meyer@gmail.com",
role = c("aut", "cre"), comment = c(ORCID = "0000-0003-4564-0899"))
Expand Down
4 changes: 2 additions & 2 deletions INDEX.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ functions easily accessible from within R and allows for automatic evaluation
of the results.

**plinkQC** generates a per-individual and per-marker quality control report.
A step-by-step guide on how to run these analyses can be found [here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html).
A step-by-step guide on how to run these analyses can be found [here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).

Individuals and markers that fail the quality control can subsequently be
removed with **plinkQC** to generate a new, clean dataset.
Expand All @@ -44,7 +44,7 @@ knitr::include_graphics("docs/qc.png")

## <i class="fa fa-rocket" aria-hidden="true"></i> Installation

The current github version of **plinkQC** is: 0.3.0 and can be
The current github version of **plinkQC** is: 0.3.1 and can be
installed via
```{r github install, eval=FALSE}
library(devtools)
Expand Down
11 changes: 8 additions & 3 deletions INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ R and allows for automatic evaluation of the results.

**plinkQC** generates a per-individual and per-marker quality control
report. A step-by-step guide on how to run these analyses can be found
[here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html).
[here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).

Individuals and markers that fail the quality control can subsequently
be removed with **plinkQC** to generate a new, clean dataset.
Expand All @@ -29,11 +29,11 @@ datasets is documented in detail
Removal of individuals based on relationship status via **plinkQC** is
optimised to retain as many individuals as possible in the study.

![](docs/qc.png)<!-- -->
<img src="docs/qc.png" width="744" />

## <i class="fa fa-rocket" aria-hidden="true"></i> Installation

The current github version of **plinkQC** is: 0.3.0 and can be installed
The current github version of **plinkQC** is: 0.3.1 and can be installed
via

``` r
Expand All @@ -49,3 +49,8 @@ install.packages("plinkQC")
```

A log of version changes can be found [here](news/index.html).

## <i class="fa fa-pencil" aria-hidden="true"></i> Citation

Meyer HV (2018) plinkQC: Genotype quality control in genetic association
studies. <doi:10.5281/zenodo.3373798>
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# plinkQC 0.3.1
## minor changes
* Fixed dead links in vignettes (caused by migration of repository): da987d8f225aa6aca0596b9c4f6a2484b102bdb6
* Added not about chrY in Hapmap data (vignette): e8afbb9842ed9421461a8114ac0a00f7955cf0c0

# plinkQC 0.3.0
## major changes
* Relationship filter can handle more complicated relationship scenarios as
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ to retain as many individuals as possible in the study.

## <i class="fa fa-rocket" aria-hidden="true"></i> Installation

The current github version of **plinkQC** is: 0.3.0 and can be
The current github version of **plinkQC** is: 0.3.1 and can be
installed via
```{r github install, eval=FALSE}
library(devtools)
Expand Down
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/plinkQC)](https://cran.r-project.org/package=plinkQC)
[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/plinkQC)](https://cran.r-project.org/package=plinkQC)
[![Build
Status](https://travis-ci.org/meyer-lab-cshl/plinkQC.svg?branch=master)](https://travis-ci.org/meyer-lab-cshl/plinkQC)
[![License:
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/plinkQC?color=blue)](https://CRAN.R-project.org/package=plinkQC)
[![Downloads](http://cranlogs.r-pkg.org/badges/grand-total/plinkQC?color=blue)](https://cran.r-project.org/package=plinkQC)

## <i class="fa fa-map" aria-hidden="true"></i> plinkQC

Expand All @@ -17,10 +17,8 @@ genetic marker) and relationship functions easily accessible from within
R and allows for automatic evaluation of the results.

Full documentation is available at

<http://meyer-lab-cshl.github.io/plinkQC/>.


**plinkQC** generates a per-individual and per-marker quality control
report. A step-by-step guide on how to run these analyses can be found
[here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).
Expand All @@ -38,7 +36,7 @@ optimised to retain as many individuals as possible in the study.

## <i class="fa fa-rocket" aria-hidden="true"></i> Installation

The current github version of **plinkQC** is: 0.3.0 and can be installed
The current github version of **plinkQC** is: 0.3.1 and can be installed
via

``` r
Expand All @@ -58,6 +56,6 @@ A log of version changes can be found
[here](https://github.com/meyer-lab-cshl/plinkQC/blob/master/NEWS.md).

## <i class="fa fa-pencil" aria-hidden="true"></i> Citation
Meyer HV (2018) plinkQC: Genotype quality control in genetic association
studies. doi:10.5281/zenodo.3373798

Meyer HV (2018) plinkQC: Genotype quality control in genetic association
studies. <doi:10.5281/zenodo.3373798>
4 changes: 2 additions & 2 deletions doc/AncestryCheck.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
Expand All @@ -20,6 +20,6 @@ knitr::opts_chunk$set(
# sep=""),
# interactive=TRUE)

## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkAncestry.png")

4 changes: 2 additions & 2 deletions doc/AncestryCheck.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ is provided with $plinkQC$ in `file.path(find.package('plinkQC'),'extdata')`.
## Download reference data
A suitable reference dataset should be downloaded and if necessary, re-formated
into PLINK format. Vignettes
['Processing HapMap III reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/HapMap.html) and
['Processing 1000Genomes reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/1000Genomes.html),
['Processing HapMap III reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/HapMap.html) and
['Processing 1000Genomes reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/1000Genomes.html),
show the download and processing of the HapMap phase III and 1000Genomes phase
III dataset, respectively. In this example, we will use the HammapIII data as
the reference dataset.
Expand Down
Binary file modified doc/AncestryCheck.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion doc/Genomes1000.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
Expand Down
2 changes: 1 addition & 1 deletion doc/Genomes1000.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
1000 Genomes study [@a1000Genomes2015],[@b1000Genomes2015] as a reference
dataset. Using the 1000 Genomes reference, population structure down to
large-scale continental ancestry can be detected. A step-by-step instruction on
how to conduct this ancestry analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
how to conduct this ancestry analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).


# Workflow
Expand Down
Binary file modified doc/Genomes1000.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion doc/HapMap.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## ----setup knitr, include = FALSE----------------------------------------
## ----setup knitr, include = FALSE---------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
Expand Down
14 changes: 12 additions & 2 deletions doc/HapMap.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
HapMap study [@HapMap2005][@HapMap2007][@HapMap2010] as a reference dataset.
Using this reference, population structure down to large-scale continental
ancestry can be detected. A step-by-step instruction on how to conduct this
analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).


# Workflow
Expand Down Expand Up @@ -106,12 +106,22 @@ https://genome.ucsc.edu/cgi-bin/hgLiftOver and the appropriate liftover chain fr
zero-based [UCSC bed](https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
format.

Hapmap chromosome data is encoded numerically, with chrX represented by chr23,
and chrY as chr24. In order to match to data encoded by chrX and chrY, we will
have to rename these hapmap chromosomes. Converting to zero-based UCSC format
and re-coding chromosome codes can be achieved by:

```{bash prepare liftover, eval=FALSE}
awk '{print "chr" $1, $4 -1, $4, $2 }' $refdir/HapMapIII_NCBI36.bim | \
sed 's/chr23/chrX/' | sed 's/chr24/chrY/' > \
$refdir/HapMapIII_NCBI36.tolift
```

[Note: In the official HapMap release, chromosome codes described above, however
in the orignal download files (link above), no chr24 detected. I will keep this
line in for completeness, but note, when inspecting file that no chr24/chrY are
present.]

We use the liftOver tool and the UCSC bed formated annotation file together
with the appropriate chain file to do the lift over.

Expand All @@ -133,7 +143,7 @@ awk '{print $4, $3}' $refdir/HapMapIII_CGRCh37 > $refdir/HapMapIII_CGRCh37.pos
## Update the reference data
We can now use PLINK to extract the mappable variants from the old build and
update their position. After these steps, the HapMap III dataset can be used
for infering study ancestry as described in the corresponding [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
for infering study ancestry as described in the corresponding [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
```{bash update annotation, eval=FALSE}
plink --bfile $refdir/HapMapIII_NCBI36 \
--extract $refdir/HapMapIII_CGRCh37.snps \
Expand Down
Binary file modified doc/HapMap.pdf
Binary file not shown.
39 changes: 19 additions & 20 deletions doc/plinkQC.R
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
## ----setup, include = FALSE----------------------------------------------
## ----setup, include = FALSE---------------------------------------------------
library(plinkQC)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)

## ----set parameters------------------------------------------------------
## ----set parameters-----------------------------------------------------------
package.dir <- find.package('plinkQC')
indir <- file.path(package.dir, 'extdata')
qcdir <- tempdir()
name <- 'data'
path2plink <- "/Users/hannah/bin/plink"

## ----copy files----------------------------------------------------------
## ----copy files---------------------------------------------------------------
system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
qcdir))

## ----individual QC, eval=FALSE, fig.height=12, fig.width=9--------------
## ----individual QC, eval=FALSE, fig.height=12, fig.width=9-------------------
# fail_individuals <- perIndividualQC(indir=indir, qcdir=qcdir, name=name,
# refSamplesFile=paste(indir, "/HapMap_ID2Pop.txt",
# sep=""),
Expand All @@ -31,41 +31,40 @@ system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
par(mfrow=c(2,1), las=1)
knitr::include_graphics("individualQC.png")

## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE--------
## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE-------------
# overview_individuals <- overviewPerIndividualQC(fail_individuals,
# interactive=TRUE)

## ----load overviewIndividualQC, out.width = "500px", echo=FALSE----------
## ----load overviewIndividualQC, out.width = "500px", echo=FALSE---------------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewQC.png")
knitr::include_graphics("overviewAncestryQC.png")

## ----marker QC, eval=FALSE-----------------------------------------------
## ----marker QC, eval=FALSE----------------------------------------------------
# fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
# path2plink=path2plink,
# verbose=TRUE, interactive=TRUE,
# showPlinkOutput=FALSE)

## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'----
## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'-------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("markerQC.png")

## ----overview marker QC, eval=FALSE--------------------------------------
## ----overview marker QC, eval=FALSE-------------------------------------------
# overview_marker <- overviewPerMarkerQC(fail_markers, interactive=TRUE)

## ----load overviewMarkerQC, out.width = "500px", echo=FALSE--------------
## ----load overviewMarkerQC, out.width = "500px", echo=FALSE-------------------
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewMarkerQC.png")

## ----clean data, eval=FALSE----------------------------------------------
## ----clean data, eval=FALSE---------------------------------------------------
# Ids <- cleanData(indir=indir, qcdir=qcdir, name=name, path2plink=path2plink,
# verbose=TRUE, showPlinkOutput=FALSE)

## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'------
## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'-----------
# fail_sex <- check_sex(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# verbose=TRUE, path2plink=path2plink)

## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkSex.png")

## ----check het miss, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'----
Expand Down Expand Up @@ -93,10 +92,10 @@ knitr::include_graphics("checkRelatedness.png")
# path2plink=path2plink, run.check_ancestry = FALSE,
# interactive=TRUE)

## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
knitr::include_graphics("checkAncestry.png")

## ----check snp missing, eval=FALSE---------------------------------------
## ----check snp missing, eval=FALSE--------------------------------------------
# fail_snpmissing <- check_snp_missingness(indir=indir, qcdir=qcdir, name=name,
# interactive=TRUE,
# path2plink=path2plink,
Expand All @@ -105,17 +104,17 @@ knitr::include_graphics("checkAncestry.png")
## ----load snp missing, out.width = "500px", echo=FALSE, fig.align='center'----
knitr::include_graphics("snpmissingness.png")

## ----check hwe, eval=FALSE-----------------------------------------------
## ----check hwe, eval=FALSE----------------------------------------------------
# fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# path2plink=path2plink, showPlinkOutput=FALSE)

## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'-------
## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'------------
knitr::include_graphics("hwe.png")

## ----check maf, eval=FALSE-----------------------------------------------
## ----check maf, eval=FALSE----------------------------------------------------
# fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
# path2plink=path2plink, showPlinkOutput=FALSE)

## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'------
## ----load maf, out.width = "500px", echo=FALSE, fig.align='center'-----------
knitr::include_graphics("maf.png")

9 changes: 8 additions & 1 deletion doc/plinkQC.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ individual IDs to the qcdir. These IDs will be removed in the computation of the
`perMarkerQC`. If the list is not present, `perMarkerQC` will send a message
about conducting the quality control on the entire dataset.

NB: To reduce the data size of the example data in `plinkQC`,
data.genome has already been reduced to the individuals that are related. Thus
the relatedness plots in C only show counts for related individuals only.

NB: To demonstrate the results of the ancestry check, the required eigenvector
file of the combined study and reference datasets have been precomputed and
for the purpose of this example will be copied to the `qcdir`. In practice,
Expand Down Expand Up @@ -156,7 +160,6 @@ overview_individuals <- overviewPerIndividualQC(fail_individuals,
```{r load overviewIndividualQC, out.width = "500px", echo=FALSE}
par(mfrow=c(2,1), las=1)
knitr::include_graphics("overviewQC.png")
knitr::include_graphics("overviewAncestryQC.png")
```


Expand Down Expand Up @@ -273,6 +276,10 @@ complex family structures, the unrelated individuals per family are selected
(e.g. in a parents-offspring trio, the offspring will be marked as fail, while
the parents will be kept in the analysis).

NB: To reduce the data size of the example data in `plinkQC`,
data.genome has already been reduced to the individuals that are related. Thus
the relatedness plots in C only show counts for related individuals only.

```{r check related, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'}
exclude_relatedness <- check_relatedness(indir=indir, qcdir=qcdir, name=name,
interactive=TRUE,
Expand Down
Binary file modified doc/plinkQC.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/404.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/CODE_OF_CONDUCT.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/LICENSE-text.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 64f898e

Please sign in to comment.