nf-core-wgsnano is a bioinformatics best-practice analysis pipeline for Nanopore Whole Genome Sequencing.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.
- Basecalling (
Dorado
) - with GPU run option.Optional for pod5/fast5 formats. - Basecalling QC (
PycoQC
) - Alignment (
Dorado
withminimap2
) - Merge all aligned bam files into a single file (
samtools
) - Haplotyping and phased variants calling (
PEPPER-Margin-DeepVariant
) - Methylation calls extraction from bam to bed files (
modkit
).- Optional step. - Depth calculation (
mosdepth
) - MultiQC (
MultiQC
) for Basecalling (PycoQC) and Depth (mosdepth)
-
Install
Nextflow
(>=22.10.1
) -
Install any of
Docker
,Singularity
(you can follow this tutorial),Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (this pipeline can NOT be run with conda)). This requirement is not needed for running the pipeline in WashU RIS cluster. -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run dhslab/nf-core-wgsnano -profile test,YOURPROFILE(S) --outdir <OUTDIR>
-
Start running your own analysis!
nextflow run dhslab/nf-core-wgsnano --input samplesheet.csv --fasta <FASTA> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
- Input:
samplesheet.csv
- This file provides directory/file paths forfast5
|pod5
|bam
reads along with their metadata. It can be specified in a configuration file or supplied directly as a command-line parameter using--input path/to/samplesheet.csv
. An example of the samplesheet is available atassets/samplesheet.csv
. - Reference genome fasta file, either in a configuration file or as
--fasta path/to/genome.fasta
command line parameter.
Parameters for customizing the workflow sequences and entry points, along with options specifically tailored to the Dorado and PEPPER components within the pipeline. For details read the usage documentaion
NXF_HOME=${PWD}/.nextflow LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" nextflow run dhslab/nf-core-wgsnano -r dev -profile test,ris,dhslab --outdir results
Notice that three profiles are used here:
test
-> to provideinput
andfasta
paths for the test runris
-> to set general configuration for RIS LSF clusterdhslab
-> to set lab-specific cluster configuration
git clone https://github.com/dhslab/nf-core-wgsnano.git
cd nf-core-wgsnano/
chmod +x bin/*
LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" "NXF_HOME=${PWD}/.nextflow ; nextflow run main.nf -profile test,ris,dhslab --outdir results"
- The pipeline is developed and optimized to be run in WashU RIS (LSF) HPC, but could be deployed in any
HPC environment supported by Nextflow
. - The pipeline does NOT support conda because some of the tools used are not available as conda packages.
- The pipeline can NOT be fully tested in a personal computer as basecalling step is computationally intense even for small test files. For testing/development purposes, the pipeline can be run in
stub
(dry-run) mode (see below).