Merge pull request #22 from meyer-lab-cshl/dev

Note in hapmap vignette, update broken links, and version number
meyer-lab-cshl · Mar 13, 2020 · 64f898e · 64f898e
2 parents 534ec7b + be6b5c9
commit 64f898e
Show file tree

Hide file tree

Showing 58 changed files with 159 additions and 118 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: plinkQC
 Type: Package
 Title: Genotype Quality Control with 'PLINK'
-Version: 0.3.0
+Version: 0.3.1
 Authors@R:
     person("Hannah", "Meyer", email = "hannah.v.meyer@gmail.com",
     role = c("aut", "cre"), comment = c(ORCID = "0000-0003-4564-0899"))

diff --git a/INDEX.Rmd b/INDEX.Rmd
@@ -25,7 +25,7 @@ functions easily accessible from within R and allows for automatic evaluation
 of the results.
 
 **plinkQC** generates a per-individual and per-marker quality control report.
-A step-by-step guide on how to run these analyses can be found [here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html).
+A step-by-step guide on how to run these analyses can be found [here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).
 
 Individuals and markers that fail the quality control can subsequently be
 removed with **plinkQC** to generate a new, clean dataset.
@@ -44,7 +44,7 @@ knitr::include_graphics("docs/qc.png")
 
 ## <i class="fa fa-rocket" aria-hidden="true"></i> Installation
 
-The current github version of **plinkQC**  is: 0.3.0 and can be
+The current github version of **plinkQC**  is: 0.3.1 and can be
 installed via
 ```{r github install, eval=FALSE}
 library(devtools)

diff --git a/INDEX.md b/INDEX.md
@@ -16,7 +16,7 @@ R and allows for automatic evaluation of the results.
 
 **plinkQC** generates a per-individual and per-marker quality control
 report. A step-by-step guide on how to run these analyses can be found
-[here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html).
+[here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).
 
 Individuals and markers that fail the quality control can subsequently
 be removed with **plinkQC** to generate a new, clean dataset.
@@ -29,11 +29,11 @@ datasets is documented in detail
 Removal of individuals based on relationship status via **plinkQC** is
 optimised to retain as many individuals as possible in the study.
 
-![](docs/qc.png)<!-- -->
+<img src="docs/qc.png" width="744" />
 
 ## <i class="fa fa-rocket" aria-hidden="true"></i> Installation
 
-The current github version of **plinkQC** is: 0.3.0 and can be installed
+The current github version of **plinkQC** is: 0.3.1 and can be installed
 via
 
 ``` r
@@ -49,3 +49,8 @@ install.packages("plinkQC")
 ```
 
 A log of version changes can be found [here](news/index.html).
+
+## <i class="fa fa-pencil" aria-hidden="true"></i> Citation
+
+Meyer HV (2018) plinkQC: Genotype quality control in genetic association
+studies. <doi:10.5281/zenodo.3373798>
diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,8 @@
+# plinkQC 0.3.1
+## minor changes
+* Fixed dead links in vignettes (caused by migration of repository): da987d8f225aa6aca0596b9c4f6a2484b102bdb6
+* Added not about chrY in Hapmap data (vignette): e8afbb9842ed9421461a8114ac0a00f7955cf0c0
+
 # plinkQC 0.3.0
 ## major changes
 * Relationship filter can handle more complicated relationship scenarios as

diff --git a/README.Rmd b/README.Rmd
@@ -46,7 +46,7 @@ to retain as many individuals as possible in the study.
 
 ## <i class="fa fa-rocket" aria-hidden="true"></i> Installation
 
-The current github version of **plinkQC**  is: 0.3.0 and can be
+The current github version of **plinkQC**  is: 0.3.1 and can be
 installed via
 ```{r github install, eval=FALSE}
 library(devtools)

diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
 
 <!-- README.md is generated from README.Rmd. Please edit that file -->
 
-[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/plinkQC)](https://cran.r-project.org/package=plinkQC)
+[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/plinkQC)](https://cran.r-project.org/package=plinkQC)
 [![Build
 Status](https://travis-ci.org/meyer-lab-cshl/plinkQC.svg?branch=master)](https://travis-ci.org/meyer-lab-cshl/plinkQC)
 [![License:
 MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/plinkQC?color=blue)](https://CRAN.R-project.org/package=plinkQC)
+[![Downloads](http://cranlogs.r-pkg.org/badges/grand-total/plinkQC?color=blue)](https://cran.r-project.org/package=plinkQC)
 
 ## <i class="fa fa-map" aria-hidden="true"></i> plinkQC
 
@@ -17,10 +17,8 @@ genetic marker) and relationship functions easily accessible from within
 R and allows for automatic evaluation of the results.
 
 Full documentation is available at
-
 <http://meyer-lab-cshl.github.io/plinkQC/>.
 
-
 **plinkQC** generates a per-individual and per-marker quality control
 report. A step-by-step guide on how to run these analyses can be found
 [here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).
@@ -38,7 +36,7 @@ optimised to retain as many individuals as possible in the study.
 
 ## <i class="fa fa-rocket" aria-hidden="true"></i> Installation
 
-The current github version of **plinkQC** is: 0.3.0 and can be installed
+The current github version of **plinkQC** is: 0.3.1 and can be installed
 via
 
 ``` r
@@ -58,6 +56,6 @@ A log of version changes can be found
 [here](https://github.com/meyer-lab-cshl/plinkQC/blob/master/NEWS.md).
 
 ## <i class="fa fa-pencil" aria-hidden="true"></i> Citation
-Meyer HV (2018) plinkQC: Genotype quality control in genetic association
-studies. doi:10.5281/zenodo.3373798
 
+Meyer HV (2018) plinkQC: Genotype quality control in genetic association
+studies. <doi:10.5281/zenodo.3373798>
diff --git a/doc/AncestryCheck.R b/doc/AncestryCheck.R
@@ -1,4 +1,4 @@
-## ----setup knitr, include = FALSE----------------------------------------
+## ----setup knitr, include = FALSE---------------------------------------------
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"
@@ -20,6 +20,6 @@ knitr::opts_chunk$set(
 #                                                  sep=""),
 #                              interactive=TRUE)
 
-## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
+## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
 knitr::include_graphics("checkAncestry.png")
 
diff --git a/doc/AncestryCheck.Rmd b/doc/AncestryCheck.Rmd
@@ -41,8 +41,8 @@ is provided with $plinkQC$ in `file.path(find.package('plinkQC'),'extdata')`.
 ## Download reference data
 A suitable reference dataset should be downloaded and if necessary, re-formated
 into PLINK format. Vignettes
-['Processing HapMap III reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/HapMap.html) and 
-['Processing 1000Genomes  reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/1000Genomes.html),
+['Processing HapMap III reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/HapMap.html) and 
+['Processing 1000Genomes  reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/1000Genomes.html),
 show the download and processing of the HapMap phase III and 1000Genomes phase
 III dataset, respectively. In this example, we will use the HammapIII data as
 the reference dataset.

diff --git a/doc/AncestryCheck.pdf b/doc/AncestryCheck.pdf
diff --git a/doc/Genomes1000.R b/doc/Genomes1000.R
@@ -1,4 +1,4 @@
-## ----setup knitr, include = FALSE----------------------------------------
+## ----setup knitr, include = FALSE---------------------------------------------
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"

diff --git a/doc/Genomes1000.Rmd b/doc/Genomes1000.Rmd
@@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
 1000 Genomes study [@a1000Genomes2015],[@b1000Genomes2015] as a reference
 dataset. Using the 1000 Genomes reference, population structure down to
 large-scale continental ancestry can be detected. A step-by-step instruction on
-how to conduct this ancestry analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
+how to conduct this ancestry analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
 
 
 # Workflow

diff --git a/doc/Genomes1000.pdf b/doc/Genomes1000.pdf
diff --git a/doc/HapMap.R b/doc/HapMap.R
@@ -1,4 +1,4 @@
-## ----setup knitr, include = FALSE----------------------------------------
+## ----setup knitr, include = FALSE---------------------------------------------
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"

diff --git a/doc/HapMap.Rmd b/doc/HapMap.Rmd
@@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
 HapMap study [@HapMap2005][@HapMap2007][@HapMap2010] as a reference dataset. 
 Using this reference, population structure down to large-scale continental
 ancestry can be detected. A step-by-step instruction on how to conduct this
-analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
+analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
 
 
 # Workflow
@@ -106,12 +106,22 @@ https://genome.ucsc.edu/cgi-bin/hgLiftOver and the appropriate liftover chain fr
 zero-based [UCSC bed](https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
 format.
 
+Hapmap chromosome data is encoded numerically, with chrX represented by chr23,
+and chrY as chr24. In order to match to data encoded by chrX and chrY, we will
+have to rename these hapmap chromosomes. Converting to zero-based UCSC format 
+and re-coding chromosome codes can be achieved by:
+
 ```{bash prepare liftover, eval=FALSE}
 awk '{print "chr" $1, $4 -1, $4, $2 }' $refdir/HapMapIII_NCBI36.bim | \
     sed 's/chr23/chrX/' | sed 's/chr24/chrY/' > \
     $refdir/HapMapIII_NCBI36.tolift
 ```
 
+[Note: In the official HapMap release, chromosome codes described above, however
+in the orignal download files (link above), no chr24 detected. I will keep this
+line in for completeness, but note, when inspecting file that no chr24/chrY are
+present.]
+
 We use the liftOver tool and the UCSC bed formated annotation file together
 with the appropriate chain file to do the lift over.
 
@@ -133,7 +143,7 @@ awk '{print $4, $3}' $refdir/HapMapIII_CGRCh37 > $refdir/HapMapIII_CGRCh37.pos
 ## Update the reference data
 We can now use PLINK to extract the mappable variants from the old build and
 update their position. After these steps, the HapMap III dataset can be used 
-for infering study ancestry as described in the corresponding [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
+for infering study ancestry as described in the corresponding [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
 ```{bash update annotation, eval=FALSE}
 plink --bfile $refdir/HapMapIII_NCBI36 \
     --extract $refdir/HapMapIII_CGRCh37.snps \

diff --git a/doc/HapMap.pdf b/doc/HapMap.pdf
diff --git a/doc/plinkQC.R b/doc/plinkQC.R
@@ -1,22 +1,22 @@
-## ----setup, include = FALSE----------------------------------------------
+## ----setup, include = FALSE---------------------------------------------------
 library(plinkQC)
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"
 )
 
-## ----set parameters------------------------------------------------------
+## ----set parameters-----------------------------------------------------------
 package.dir <- find.package('plinkQC')
 indir <- file.path(package.dir, 'extdata')
 qcdir <- tempdir()
 name <- 'data'
 path2plink <- "/Users/hannah/bin/plink"
 
-## ----copy files----------------------------------------------------------
+## ----copy files---------------------------------------------------------------
 system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
              qcdir))
 
-## ----individual QC,  eval=FALSE, fig.height=12, fig.width=9--------------
+## ----individual QC,  eval=FALSE, fig.height=12, fig.width=9-------------------
 #  fail_individuals <- perIndividualQC(indir=indir, qcdir=qcdir, name=name,
 #                              refSamplesFile=paste(indir, "/HapMap_ID2Pop.txt",
 #                                                   sep=""),
@@ -31,41 +31,40 @@ system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("individualQC.png")
 
-## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE--------
+## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE-------------
 #  overview_individuals <- overviewPerIndividualQC(fail_individuals,
 #                                                  interactive=TRUE)
 
-## ----load overviewIndividualQC, out.width = "500px", echo=FALSE----------
+## ----load overviewIndividualQC, out.width = "500px", echo=FALSE---------------
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("overviewQC.png")
-knitr::include_graphics("overviewAncestryQC.png")
 
-## ----marker QC, eval=FALSE-----------------------------------------------
+## ----marker QC, eval=FALSE----------------------------------------------------
 #  fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
 #                              path2plink=path2plink,
 #                              verbose=TRUE, interactive=TRUE,
 #                              showPlinkOutput=FALSE)
 
-## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'----
+## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'-------
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("markerQC.png")
 
-## ----overview marker QC, eval=FALSE--------------------------------------
+## ----overview marker QC, eval=FALSE-------------------------------------------
 #  overview_marker <- overviewPerMarkerQC(fail_markers, interactive=TRUE)
 
-## ----load overviewMarkerQC, out.width = "500px", echo=FALSE--------------
+## ----load overviewMarkerQC, out.width = "500px", echo=FALSE-------------------
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("overviewMarkerQC.png")
 
-## ----clean data, eval=FALSE----------------------------------------------
+## ----clean data, eval=FALSE---------------------------------------------------
 #  Ids  <- cleanData(indir=indir, qcdir=qcdir, name=name, path2plink=path2plink,
 #                              verbose=TRUE, showPlinkOutput=FALSE)
 
-## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'------
+## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'-----------
 #  fail_sex <- check_sex(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
 #                        verbose=TRUE, path2plink=path2plink)
 
-## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'----
+## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'-------
 knitr::include_graphics("checkSex.png")
 
 ## ----check het miss, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'----
@@ -93,10 +92,10 @@ knitr::include_graphics("checkRelatedness.png")
 #                              path2plink=path2plink, run.check_ancestry = FALSE,
 #                              interactive=TRUE)
 
-## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
+## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
 knitr::include_graphics("checkAncestry.png")
 
-## ----check snp missing, eval=FALSE---------------------------------------
+## ----check snp missing, eval=FALSE--------------------------------------------
 #  fail_snpmissing <- check_snp_missingness(indir=indir, qcdir=qcdir, name=name,
 #                                           interactive=TRUE,
 #                                           path2plink=path2plink,
@@ -105,17 +104,17 @@ knitr::include_graphics("checkAncestry.png")
 ## ----load snp missing, out.width = "500px", echo=FALSE, fig.align='center'----
 knitr::include_graphics("snpmissingness.png")
 
-## ----check hwe, eval=FALSE-----------------------------------------------
+## ----check hwe, eval=FALSE----------------------------------------------------
 #  fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
 #                        path2plink=path2plink, showPlinkOutput=FALSE)
 
-## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'-------
+## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'------------
 knitr::include_graphics("hwe.png")
 
-## ----check maf, eval=FALSE-----------------------------------------------
+## ----check maf, eval=FALSE----------------------------------------------------
 #  fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
 #                        path2plink=path2plink, showPlinkOutput=FALSE)
 
-## ----load  maf, out.width = "500px", echo=FALSE, fig.align='center'------
+## ----load  maf, out.width = "500px", echo=FALSE, fig.align='center'-----------
 knitr::include_graphics("maf.png")
 
diff --git a/doc/plinkQC.Rmd b/doc/plinkQC.Rmd
@@ -115,6 +115,10 @@ individual IDs to the qcdir. These IDs will be removed in the computation of the
 `perMarkerQC`. If the list is not present, `perMarkerQC` will send a message
 about conducting the quality control on the entire dataset.
 
+NB: To reduce the data size of the example data in `plinkQC`,
+data.genome  has already been reduced to the individuals that are related. Thus
+the relatedness plots in C only show counts for related individuals only.
+
 NB: To demonstrate the results of the ancestry check, the required eigenvector
 file of the combined study and reference datasets have been precomputed and
 for the purpose of this example will be copied to the `qcdir`. In practice,
@@ -156,7 +160,6 @@ overview_individuals <- overviewPerIndividualQC(fail_individuals,
 ```{r load overviewIndividualQC, out.width = "500px", echo=FALSE}
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("overviewQC.png")
-knitr::include_graphics("overviewAncestryQC.png")
 ```
 
 
@@ -273,6 +276,10 @@ complex family structures, the unrelated individuals per family are selected
 (e.g. in a parents-offspring trio, the offspring will be marked as fail, while 
 the parents will be kept in the analysis).
 
+NB: To reduce the data size of the example data in `plinkQC`,
+data.genome  has already been reduced to the individuals that are related. Thus
+the relatedness plots in C only show counts for related individuals only.
+
 ```{r check related, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'}
 exclude_relatedness <- check_relatedness(indir=indir, qcdir=qcdir, name=name,
                                          interactive=TRUE,

diff --git a/doc/plinkQC.pdf b/doc/plinkQC.pdf
diff --git a/docs/404.html b/docs/404.html
diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html
diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html