Skip to content

Latest commit

 

History

History
160 lines (124 loc) · 6.57 KB

README.md

File metadata and controls

160 lines (124 loc) · 6.57 KB

alt text

Workflow Summary

alt text

1. Introduction

ascend (Analysis of Single Cell Expression, Normalisation and Differential expression) is a user-friendly scRNA-seq analysis toolkit implemented in R. Using pre-existing and novel methods, ascend offers quick and robust tools for quality control, normalisation, dimensionality reduction, clustering and differential expression.

2. Preparing data for 'ascend'

ascend takes transcript counts, either as read counts or UMI counts that are loaded into a gene-cell expression matrix. In an expression matrix, rows represent each gene or transcript, while columns represent cells. These matrices are generally produced by most scRNA-seq pipelines. ascend was developed using data from Chromium, but has been tested with data generated by other platforms such as DropSeq and inDrop.

The expression matrix is loaded into a container object known as an Expression and Metadata Set (EMSet). This object is also capable of storing metadata related to cells and genes.

Please refer to the vignettes (browseVignettes("ascend") in R) for more information on how to use this package.

3. Installation

3.1 Preparing the R Environment

Feel free to skip some steps if you have already done those steps.

3.1.1 R installation

Please follow the R installation instructions here. If you are a Windows user, make sure you install Rtools. Please note the ascend package requires R version >= 3.5.0. The latest version of R version 3.6 is best.

3.1.2 Installing Rcpp and RcppArmadillo

Please setup Rcpp and RcppArmadillo before installing ascend. Instructions are operating system-dependant, so please refer to this page for setup instructions.

3.2 Package Installations

You will need to install the following packages to run the development version of ascend. Feel free to skip these steps if you already have these packages.

3.2.1 Packages from CRAN

You can use the install.packages() to install the packages described in this section. The pcakages you require from this repository are as follows:

  1. devtools: This package will be used to load the development version of ascend.
  2. tidyverse: This is a series of R packages for data science and visualisation. This will install packages such as dplyr, ggplot2 and tidyr.
  3. data.table: Please follow the instructions on this page for your operating system.

Remaining packages can be installed as follows:

# List of packages to install
cran_packages <- c("gridExtra","RColorBrewer")

# Easy command to install all at once
install.packages(cran_packages)
3.2.2 Packages from Bioconductor

Bioconductor is a repository for R packages related to the analysis and comprehension of high-throughput genomic data. It uses a separate set of commands for the installation of packages.

3.2.2.1 Setting up Bioconductor

Use the following code to retrieve the latest installer from Bioconductor.

## Get BiocManger from CRAN
install.packages("BiocManager")

You can then install the Bioconductor packages using install.

bioconductor_packages <- c("BiocParallel", "BiocGenerics",
                           "SingleCellExperiment", "GenomeInfoDb", 
                           "GenomeInfoDbData")
BiocManager::install(bioconductor_packages)
3.2.2.2 scater/scran package installation

scater and scran are scRNA-seq analysis toolboxes that provide more in-depth methods for QC and filtering. You may choose to install these packages if you wish to take advantage of the wrappers provided for these packages.

3.2.2.3 Differential expression packages

ascend provides wrappers for DESeq and DESeq2, so you may choose to add them to your installation. However, we will only be using DESeq for the workshop as DESeq2 will require more time than allocated for the workshop.

3.4 Installing 'ascend' via devtools

As ascend is still under development, we will use devtools to install the package.

# Load devtools package
library(devtools)

# Use devtools to install the package
install_github("powellgenomicslab/ascend", build_vignettes = TRUE)

# Load the package in R
library(ascend)

3.5 Configuring BiocParallel

This package makes extensive use of BiocParallel, enabling ascend to make the most of your computer's hardware. As each system is different, BiocParallel needs to be configured by the user. Here are some example configurations.

3.5.1 Unix/Linux/MacOS (Single Machine)

library(BiocParallel)
ncores <- parallel::detectCores() - 1
register(MulticoreParam(workers = ncores, progressbar=TRUE), default = TRUE)

3.5.2 Windows (Single Machine - Quad-core system)

The following commands allows Windows to parallelise functions via BiocParallel. Unlike multicore processing in *nix systems, Snow creates additional R sessions to export tasks to. This requires additional computational resources to run and manage the tasks.

We recomend you bypass this step if your machine has lower specs.

library(BiocParallel)
workers <- 3 # Number of cores on your machine - 1
register(SnowParam(workers = workers,
                   type = "SOCK",
                   progressbar = TRUE), default = TRUE)

4. Updating EMSets created with older versions of ascend

If you have created an EMSet using older versions of the package (< 0.6.0), please update your old objects as follows:

# Import old EMSet stored in RDS file
legacy_EMSet <- readRDS("legacy_EMSet.rds")

# Update EMSet, please make sure you overwrite your old object
legacy_EMSet <- updateObject(legacy_EMSet)

This function will repackage your data into the new SingleCellExperiment-based EMSet. If your data has been normalised, it will load your data into both the counts and normcounts slots.

5. Contact

Please report any bugs on the Issues tracker of this repository. Feel free to send any other queries to a.senabouth@garvan.org.au.