This pipeline is build to evaluate metrics to detect batch effects and/or mixing bias in single cell transcriptome data in single cell RNAseq data sets. The following metrics are included by now:
- Cellspecific Mixing Score (cms)
- k-nearest neighbour batch effect test (kbet)
- Seurat's mixingMetric (mm)
- local inverse simpson index (lisi)
- Shannon's entropy
- Graph connectivity (graph)
- Average silouette width (asw)
- Principal component regression (pcr)
Metrics are evaluated by the following criteria. For each criteria different scenarios are constructed using batch effects in real data and/or simulated batch effects. These batch effects and their simulations have been characterized in detail before (see https://almutlue.github.io/batch_snakemake/).
- Sensitivity
- Scalability
- Comparability
- Flexibility
- Practical meassures as Runtime ..
- Test each metrics performance on all batch datasets.
- Simulation with realtive increase of the batch log fold changes
- Change simulated batch effect stepwise
- Run metrics and stepwise randomly permuted batch label
- Use same lfc distributions for different datasets
- Simulation unbalanced batch effect
- Simulation with tuned cellspecificity
View results here.
To setup this pipeline follow these instructions (Step 1 -2 explain one possible way to setup and run snakemake):
- Set up and activate an Anaconda enviroment with Snakemake >= v.5.6.0 (or sth. eqivalent)
- Make sure your path to R is exported within snakemake
- e.g. adding
*export PATH="/your/prefered/R/bin:$PATH"*
in your*~/.bashrc*
- Clone this repository
- Caution: If you don't want to get all analysis that came with this repo you need to clean the
docs
directory from all files except_site.yaml
- . Install all required R packages using renv
- Create
**log**
and**out**
directories. - Run:
*snakemake dir_setup*
to set up the neccessary directory structure to make all rules work. - If you want to view or share your analysis as website, activate github pages within your corresponding repo and specify the
*/docs*
as source directory.
To run the entire pipeline:
- Copy your preprocessed
*SingleCellExperiment*
dataset into*/src/datasets/*
- Generate a corresponding metadata file and save it at
*/src/meta_files/*
- Run snakemake
- Push results to github and refresh it's web deployment.