snakemake_RNA-seq

This repo is forked from KoesGroup/Snakemake_hisat-DESeq and customized by me.

A snakemake pipeline for the analysis of RNA-seq data that makes use of hisat2 and Stringtie.

Aim

To align. count, normalize counts and compute DEG between conditions using single-end or paired-end Illumina RNA-seq data.

Content

Snakefile:
config.yaml:
data/:
envs/:
samples.tsv:

Usage

Download or clone the Github repository

You will need a local copy of the Snakemake_RNA-seq on your machine. You can either:

use git in the shell: `git clone git@github.com:WilliamJeong2/snakemake_RNA-seq.git
click on "Clone or download" and select download

Installing and activating a virtual environment

First, you need to create an environment where Snakemake and the python pandas package and something else will be installed. To do that, we will use the conda package manager.

Create a virtual environment named rna-seq using the global_env.yaml file with the folling command: conda env create --name rna-seq --file envs/global_env.yaml
Activate this virtual environment with source activate rna-seq

The Snakefile will then take care of installing and loading the packages and softwares required by each step of the pipeline.

Configuration file

Make sure you have changed the parameters in the config.yaml file that specifies where to find the sample data file, the genomic and transcriptomic referece fasta files to use and the parameters for certains rules etc. This file is used so the Snakefile does not need to be changed when locations or parameters need to be changed.

Snakemake execution

The Snakemake pipeline/workflow management system reads a master file (often called Snakefile) to list the steps to be executed and defining their order. It has many rich features. Read more here

Dry run (recommend)

From the folder containing the Snakefile, use the command snakemake --use-conda -np to perform a dry run that prints out the rules and commands.

Real run

Simply type snakemake --use-conda and provide the number of cores with --cores 60 for the cores for instance.

output files

the RNA-seq read alignment files : *.bam (in temp dir)
the fastqc report files : *.html (in results dir)
the unscaled RNA-seq read counts : counts.txt (in results dir)
gene/transcript level RPKM or FPKM : gene_FPKM.csv (in results dir)

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
data		data
docker		docker
envs		envs
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
Snakefile_toCount		Snakefile_toCount

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snakemake_RNA-seq

Aim

Content

Usage

Download or clone the Github repository

Installing and activating a virtual environment

Configuration file

Snakemake execution

Dry run (recommend)

Real run

output files

About

Releases 5

Packages

Languages

williamjeong2/snakemake_RNA-seq

Folders and files

Latest commit

History

Repository files navigation

snakemake_RNA-seq

Aim

Content

Usage

Download or clone the Github repository

Installing and activating a virtual environment

Configuration file

Snakemake execution

Dry run (recommend)

Real run

output files

About

Topics

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages