Skip to content

Comparison of the output of different versions of the nf-core/rnaseq pipeline for 3 datasets containing ERCC spike-ins

License

Notifications You must be signed in to change notification settings

qbic-projects/rnaseq-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nf-core/rnaseq benchmark: how do tool combinations in different pipeline versions affect the analysis outcome?

A comparison of the output of different versions (v.1.4.2 and v.3.2) of the nf-core/rnaseq pipeline

Five different pipeline settings were run on three publicly available datasets from different organisms (human, plant, fish) of varying sizes (117GB, 37GB, 11GB) containing spike-ins of the External RNA Control Consortium (ERCC).

Pipeline settings: nf-core/rnaseq

The two versions of the nf-core/rnaseq pipeline (v.1.4.2 and v.3.2) were run in five settings, differing in aligner and quantification tools. For the older pipeline version v1.4.2 the options --aligner salmon and hisat2 were used, while for the newer pipeline version v3.2 the options --aligner star_salmon and star_rsem, as well as the setting --pseudo_aligner salmon --skip_alignment true were executed.

Datasets

Reference genome and annotations:

The iGenomes Ensembl references for Homo sapiens (GRCh37), Arabidopsis thaliana (TAIR10) and Danio rerio (GRCz10) were used for analysis after adding the ERCC sequences and annotations to the .fasta and .gtf files.

Data analysis

The qbic-pipelines/rnadeseq pipeline was used to apply downstream analysis for rnaseq output with DESeq2 to identify differentially expressed (DE) genes. Analysis and visualization of the DESeq2 output was performed in a Python Jupyter Notebook (6.3.0), applying mainly the packages pandas (1.2.4), numpy (1.20.2), scipy.stats (1.7.0) and scikit-learn (1.0). Graphs were generated with the python packages matplotlib (3.3.4) and seaborn (0.11.2). Venn diagrams were drawn using the R (4.2.2) library VennDiagram (1.7.3).

Results

The results were submitted to the journal NAR Genomics and Bioinformatics and pre-published on BioRxiv: How tool combinations in different pipeline versions affect the outcome in RNA-seq analysis. The Authors Original Version and Supplements can also be found in the Paper/ folder.

About

Comparison of the output of different versions of the nf-core/rnaseq pipeline for 3 datasets containing ERCC spike-ins

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published