This repository is a copy of https://github.com/pachterlab/BP_2021 with the modifications described in "A like-for-like comparison of lightweight-mapping pipelines for single-cell RNA-seq data pre-processing". The original repository corresponds to the preprint "Benchmarking of lightweight-mapping based single-cell RNA-seq pre-processing " by A. Sina Booeshaghi and Lior Pachter.
Note: All the addresses are relative to the main directory of the repository.
- navigate to
./analysis/scripts
- Run
$bash make_dirs.sh
- navigate to
./analysis/scripts
- Run
$bash gather_cr_barcodes.sh
- Make sure you have the sratools 2.9 is installed on your system (https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.9.6/sratoolkit.2.9.6-ubuntu64.tar.gz)
- navigate to
./analysis/scripts/
- Run
$bash gather_data.sh
Downloads the fastq files and moves the ones for each sample to a directory calledsamples/{species}-{sample-name}
- Make sure 'bamtofastq' is downloaded and added to the PATH ((https://github.com/10XGenomics/bamtofastq)
- navigate to
./analysis/scripts/
- Run
$bash gather_refs.sh
This script, for human, mouse and human_mouse, uses the.mktranscriptome.sh
to build the transcriptome from the genome and the annotation files, and use themkt2g.sh
to generate the gene-to-transcript files. For other references, it usesmkt2g_rest.sh
to generate the gene-to-transcript files. Then, the fasta file and the corresponding t2g file for each species are moved to a directory calledreferences/{species}/
- The configuration file is the JSON file :
analysis/scripts/config_all.json
The JSON files for the samples you wish to process should be located at a directory and the path of the directory should be provided inconfig_all.json
. The JSON files should be similar to the ones located at./analysis/scripts/configs/
- navigate to
./analysis/scripts/
- Run
$./make_indices.sh PATH/TO/CONFIG
- Run
$./run_all_salmon.sh PATH/TO/CONFIG
- Run
$./run_all_kallisto.sh PATH/TO/CONFIG
- Set the config file's path to the
config_all
variable - Execute all the commands in the memtime notebook:
./analysis/notebooks/memtime.ipynb
Make sure the results for all the samples generated by both tools are located at ./data/kallisto_out
and ./data/salmon_out
- Prepare the data for plotting gene set enrichment analysis
- Make sure the Seurat version 4.0 and othe required R packages are installed
- navigate to the
./analysis/notebooks/
directory - Run
$Rscript run_gsea_bar_full.R
- Load the data for making the plots
- navigate to the
./analysis/scripts/
- Run
$python mkdata.py -d sample_name -o plotting/output/for/the/sample
- navigate to the
- Generate all the plots
- Run
$python mkplot.py -d sample_name -i plotting/output/for/the/sample -o output/dir/
- Run