Update links and build package

meyer-lab-cshl · Mar 12, 2020 · da987d8 · da987d8
1 parent e8afbb9
commit da987d8
Show file tree

Hide file tree

Showing 21 changed files with 63 additions and 54 deletions.
diff --git a/INDEX.Rmd b/INDEX.Rmd
@@ -25,7 +25,7 @@ functions easily accessible from within R and allows for automatic evaluation
 of the results.
 
 **plinkQC** generates a per-individual and per-marker quality control report.
-A step-by-step guide on how to run these analyses can be found [here](https://hannahvmeyer.github.io/plinkQC/articles/plinkQC.html).
+A step-by-step guide on how to run these analyses can be found [here](https://meyer-lab-cshl.github.io/plinkQC/articles/plinkQC.html).
 
 Individuals and markers that fail the quality control can subsequently be
 removed with **plinkQC** to generate a new, clean dataset.

diff --git a/doc/AncestryCheck.R b/doc/AncestryCheck.R
@@ -1,4 +1,4 @@
-## ----setup knitr, include = FALSE----------------------------------------
+## ----setup knitr, include = FALSE---------------------------------------------
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"
@@ -20,6 +20,6 @@ knitr::opts_chunk$set(
 #                                                  sep=""),
 #                              interactive=TRUE)
 
-## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
+## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
 knitr::include_graphics("checkAncestry.png")
 
diff --git a/doc/AncestryCheck.Rmd b/doc/AncestryCheck.Rmd
@@ -41,8 +41,8 @@ is provided with $plinkQC$ in `file.path(find.package('plinkQC'),'extdata')`.
 ## Download reference data
 A suitable reference dataset should be downloaded and if necessary, re-formated
 into PLINK format. Vignettes
-['Processing HapMap III reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/HapMap.html) and 
-['Processing 1000Genomes  reference data for ancestry estimation'](https://hannahvmeyer.github.io/plinkQC/articles/1000Genomes.html),
+['Processing HapMap III reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/HapMap.html) and 
+['Processing 1000Genomes  reference data for ancestry estimation'](https://meyer-lab-cshl.github.io/plinkQC/articles/1000Genomes.html),
 show the download and processing of the HapMap phase III and 1000Genomes phase
 III dataset, respectively. In this example, we will use the HammapIII data as
 the reference dataset.

diff --git a/doc/AncestryCheck.pdf b/doc/AncestryCheck.pdf
diff --git a/doc/Genomes1000.R b/doc/Genomes1000.R
@@ -1,4 +1,4 @@
-## ----setup knitr, include = FALSE----------------------------------------
+## ----setup knitr, include = FALSE---------------------------------------------
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"

diff --git a/doc/Genomes1000.Rmd b/doc/Genomes1000.Rmd
@@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
 1000 Genomes study [@a1000Genomes2015],[@b1000Genomes2015] as a reference
 dataset. Using the 1000 Genomes reference, population structure down to
 large-scale continental ancestry can be detected. A step-by-step instruction on
-how to conduct this ancestry analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
+how to conduct this ancestry analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
 
 
 # Workflow

diff --git a/doc/Genomes1000.pdf b/doc/Genomes1000.pdf
diff --git a/doc/HapMap.R b/doc/HapMap.R
@@ -1,4 +1,4 @@
-## ----setup knitr, include = FALSE----------------------------------------
+## ----setup knitr, include = FALSE---------------------------------------------
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"

diff --git a/doc/HapMap.Rmd b/doc/HapMap.Rmd
@@ -36,7 +36,7 @@ The following vignette shows the processing steps required to use samples of the
 HapMap study [@HapMap2005][@HapMap2007][@HapMap2010] as a reference dataset. 
 Using this reference, population structure down to large-scale continental
 ancestry can be detected. A step-by-step instruction on how to conduct this
-analysis is described in this [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
+analysis is described in this [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
 
 
 # Workflow
@@ -106,12 +106,22 @@ https://genome.ucsc.edu/cgi-bin/hgLiftOver and the appropriate liftover chain fr
 zero-based [UCSC bed](https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
 format.
 
+Hapmap chromosome data is encoded numerically, with chrX represented by chr23,
+and chrY as chr24. In order to match to data encoded by chrX and chrY, we will
+have to rename these hapmap chromosomes. Converting to zero-based UCSC format 
+and re-coding chromosome codes can be achieved by:
+
 ```{bash prepare liftover, eval=FALSE}
 awk '{print "chr" $1, $4 -1, $4, $2 }' $refdir/HapMapIII_NCBI36.bim | \
     sed 's/chr23/chrX/' | sed 's/chr24/chrY/' > \
     $refdir/HapMapIII_NCBI36.tolift
 ```
 
+[Note: In the official HapMap release, chromosome codes described above, however
+in the orignal download files (link above), no chr24 detected. I will keep this
+line in for completeness, but note, when inspecting file that no chr24/chrY are
+present.]
+
 We use the liftOver tool and the UCSC bed formated annotation file together
 with the appropriate chain file to do the lift over.
 
@@ -133,7 +143,7 @@ awk '{print $4, $3}' $refdir/HapMapIII_CGRCh37 > $refdir/HapMapIII_CGRCh37.pos
 ## Update the reference data
 We can now use PLINK to extract the mappable variants from the old build and
 update their position. After these steps, the HapMap III dataset can be used 
-for infering study ancestry as described in the corresponding [vignette](https://hannahvmeyer.github.io/plinkQC/articles/AncestryCheck.html).
+for infering study ancestry as described in the corresponding [vignette](https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html).
 ```{bash update annotation, eval=FALSE}
 plink --bfile $refdir/HapMapIII_NCBI36 \
     --extract $refdir/HapMapIII_CGRCh37.snps \

diff --git a/doc/HapMap.pdf b/doc/HapMap.pdf
diff --git a/doc/plinkQC.R b/doc/plinkQC.R
@@ -1,22 +1,22 @@
-## ----setup, include = FALSE----------------------------------------------
+## ----setup, include = FALSE---------------------------------------------------
 library(plinkQC)
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"
 )
 
-## ----set parameters------------------------------------------------------
+## ----set parameters-----------------------------------------------------------
 package.dir <- find.package('plinkQC')
 indir <- file.path(package.dir, 'extdata')
 qcdir <- tempdir()
 name <- 'data'
 path2plink <- "/Users/hannah/bin/plink"
 
-## ----copy files----------------------------------------------------------
+## ----copy files---------------------------------------------------------------
 system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
              qcdir))
 
-## ----individual QC,  eval=FALSE, fig.height=12, fig.width=9--------------
+## ----individual QC,  eval=FALSE, fig.height=12, fig.width=9-------------------
 #  fail_individuals <- perIndividualQC(indir=indir, qcdir=qcdir, name=name,
 #                              refSamplesFile=paste(indir, "/HapMap_ID2Pop.txt",
 #                                                   sep=""),
@@ -31,41 +31,40 @@ system(paste("cp", file.path(package.dir, 'extdata/data.HapMapIII.eigenvec'),
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("individualQC.png")
 
-## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE--------
+## ----overview individual QC,fig.width=7, fig.height=7, eval=FALSE-------------
 #  overview_individuals <- overviewPerIndividualQC(fail_individuals,
 #                                                  interactive=TRUE)
 
-## ----load overviewIndividualQC, out.width = "500px", echo=FALSE----------
+## ----load overviewIndividualQC, out.width = "500px", echo=FALSE---------------
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("overviewQC.png")
-knitr::include_graphics("overviewAncestryQC.png")
 
-## ----marker QC, eval=FALSE-----------------------------------------------
+## ----marker QC, eval=FALSE----------------------------------------------------
 #  fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
 #                              path2plink=path2plink,
 #                              verbose=TRUE, interactive=TRUE,
 #                              showPlinkOutput=FALSE)
 
-## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'----
+## ----load markerQC, echo=FALSE, out.width = "500px", fig.align='center'-------
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("markerQC.png")
 
-## ----overview marker QC, eval=FALSE--------------------------------------
+## ----overview marker QC, eval=FALSE-------------------------------------------
 #  overview_marker <- overviewPerMarkerQC(fail_markers, interactive=TRUE)
 
-## ----load overviewMarkerQC, out.width = "500px", echo=FALSE--------------
+## ----load overviewMarkerQC, out.width = "500px", echo=FALSE-------------------
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("overviewMarkerQC.png")
 
-## ----clean data, eval=FALSE----------------------------------------------
+## ----clean data, eval=FALSE---------------------------------------------------
 #  Ids  <- cleanData(indir=indir, qcdir=qcdir, name=name, path2plink=path2plink,
 #                              verbose=TRUE, showPlinkOutput=FALSE)
 
-## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'------
+## ----check sex, eval=FALSE, out.width = "500px", fig.align='center'-----------
 #  fail_sex <- check_sex(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
 #                        verbose=TRUE, path2plink=path2plink)
 
-## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'----
+## ----load checkSex, out.width = "500px", echo=FALSE, fig.align='center'-------
 knitr::include_graphics("checkSex.png")
 
 ## ----check het miss, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'----
@@ -93,10 +92,10 @@ knitr::include_graphics("checkRelatedness.png")
 #                              path2plink=path2plink, run.check_ancestry = FALSE,
 #                              interactive=TRUE)
 
-## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'----
+## ----load ancestry, out.width = "500px", echo=FALSE, fig.align='center'-------
 knitr::include_graphics("checkAncestry.png")
 
-## ----check snp missing, eval=FALSE---------------------------------------
+## ----check snp missing, eval=FALSE--------------------------------------------
 #  fail_snpmissing <- check_snp_missingness(indir=indir, qcdir=qcdir, name=name,
 #                                           interactive=TRUE,
 #                                           path2plink=path2plink,
@@ -105,17 +104,17 @@ knitr::include_graphics("checkAncestry.png")
 ## ----load snp missing, out.width = "500px", echo=FALSE, fig.align='center'----
 knitr::include_graphics("snpmissingness.png")
 
-## ----check hwe, eval=FALSE-----------------------------------------------
+## ----check hwe, eval=FALSE----------------------------------------------------
 #  fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
 #                        path2plink=path2plink, showPlinkOutput=FALSE)
 
-## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'-------
+## ----load hwe, out.width = "500px", echo=FALSE, fig.align='center'------------
 knitr::include_graphics("hwe.png")
 
-## ----check maf, eval=FALSE-----------------------------------------------
+## ----check maf, eval=FALSE----------------------------------------------------
 #  fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, interactive=TRUE,
 #                        path2plink=path2plink, showPlinkOutput=FALSE)
 
-## ----load  maf, out.width = "500px", echo=FALSE, fig.align='center'------
+## ----load  maf, out.width = "500px", echo=FALSE, fig.align='center'-----------
 knitr::include_graphics("maf.png")
 
diff --git a/doc/plinkQC.Rmd b/doc/plinkQC.Rmd
@@ -115,6 +115,10 @@ individual IDs to the qcdir. These IDs will be removed in the computation of the
 `perMarkerQC`. If the list is not present, `perMarkerQC` will send a message
 about conducting the quality control on the entire dataset.
 
+NB: To reduce the data size of the example data in `plinkQC`,
+data.genome  has already been reduced to the individuals that are related. Thus
+the relatedness plots in C only show counts for related individuals only.
+
 NB: To demonstrate the results of the ancestry check, the required eigenvector
 file of the combined study and reference datasets have been precomputed and
 for the purpose of this example will be copied to the `qcdir`. In practice,
@@ -156,7 +160,6 @@ overview_individuals <- overviewPerIndividualQC(fail_individuals,
 ```{r load overviewIndividualQC, out.width = "500px", echo=FALSE}
 par(mfrow=c(2,1), las=1)
 knitr::include_graphics("overviewQC.png")
-knitr::include_graphics("overviewAncestryQC.png")
 ```
 
 
@@ -273,6 +276,10 @@ complex family structures, the unrelated individuals per family are selected
 (e.g. in a parents-offspring trio, the offspring will be marked as fail, while 
 the parents will be kept in the analysis).
 
+NB: To reduce the data size of the example data in `plinkQC`,
+data.genome  has already been reduced to the individuals that are related. Thus
+the relatedness plots in C only show counts for related individuals only.
+
 ```{r check related, eval=FALSE, fig.height=3, fig.width=5, fig.align='center'}
 exclude_relatedness <- check_relatedness(indir=indir, qcdir=qcdir, name=name,
                                          interactive=TRUE,

diff --git a/doc/plinkQC.pdf b/doc/plinkQC.pdf
diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html
diff --git a/docs/articles/AncestryCheck.html b/docs/articles/AncestryCheck.html
diff --git a/docs/articles/Genomes1000.html b/docs/articles/Genomes1000.html