-
Notifications
You must be signed in to change notification settings - Fork 68
Usage
To get to know zUMIs, we are providing an example dataset of 1 million reads generated with the SCRB-seq protocol.
wget https://github.com/sdparekh/zUMIs/raw/zUMIs-version1/ExampleData/barcoderead_HEK.1mio.fq.gz
wget https://github.com/sdparekh/zUMIs/raw/zUMIs-version1/ExampleData/cDNAread_HEK.1mio.fq.gz
If you do not have a STAR index yet, we are providing a dummy index of chromosome 22 build with STAR-2.7.3a for download from Google Drive:
https://drive.google.com/file/d/1PcaU3uaiaYYivOCLgn0VCmU0gMeyc9_M/view?usp=sharing
Extract the files after downloading:
tar -xvjf chr22_reference.tar.bz2
Now run zUMIs:
bash <path-to-zUMIs>/zUMIs.sh -y <test>.yaml
We have simplified starting zUMIs by switching to a config file for zUMIs2.0. Have a look at this annotated preset. In case you are not familiar with YAML files and/or prefer to use a graphical user interface for this, we provide a RShiny application to create YAML files. Run it in your local RStudio...
runApp('zUMIs/zUMIs-config_shiny.R')
...or use the convenient online version of the Shiny app.
Note that you will need to provide full paths to each file. Relative paths or use of ~ is discouraged.
Once you have created your config file, the run is started by calling the zUMIs-master script:
zUMIs.sh -y <myRun.yaml>
Find all possible arguments to this script here: Note that the STAR, samtools, pigz and Rscript executables used to be passed on the command line, but should now be defined in the YAML file.
USAGE: zUMIs.sh [options]
-h Print the usage info.
### Required parameters ##
-y <YAML config file> : Path to the YAML config file. Required.
### Program paths ##
-d <zUMIs-dir> : Directory containing zUMIs scripts. Default: path to this script.
To find out how you can configure the analyis using the Shiny app, check out the detailed explanations of both Mandatory parameters and Optional parameters.
Please refer to the STAR manual!
It is not necessary to generate the genome index with specific overhang and splice-site reference, zUMIs passes the GTF file to STAR while mapping to insert junctions on the fly. If you have spike-ins in your dataset, they can either be added in the genome or add on the fly while mapping by giving the path to the according fasta file as an additional reference sequence in the configuration YAML.
Here is an example:
STAR --runMode genomeGenerate --runThreadN 16 --genomeDir mm10_STAR5idx_noGTF --limitGenomeGenerateRAM 111000000000 --genomeFastaFiles mm10.fa
As default, zUMIs performs conventional or two-pass (can be set in YAML config file) mapping using STAR with the following parameters:
STAR --genomeDir "STARidx" --runThreadN "p" --readFilesCommand samtools view --sjdbGTFfile "gtf" --outFileNamePrefix "sample." --outSAMtype BAM Unsorted --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --quantMode TranscriptomeSAM --sjdbOverhang "readlength - 1" --twopassMode Basic --readFilesIn
Note that the read length is automatically detected by zUMIs.
For optimal results, it may be useful to modify mapping parameters, depending on the data and reference at hand. As an example, data with many splice junctions (eg at sequencing depths >500M reads) may need to increase the limits of splice junctions in STAR. In this case you should supplement your zUMIs config file as such:
additional_STAR_params: --limitOutSJcollapsed 2000000 --limitSjdbInsertNsj 2000000