This pipeline uses snakemake, minimap2 and nanopolish to call poly(A) tails from Oxford Nanopore direct RNA data.
The pipeline-polya-diff pipeline takes the output file tails/filtered_tails.tsv
from multiple controls and treated samples and performs analysis of shifts in poly(A) tail lengths.
The input files and parameters are specified in config.yml
:
transcriptome
- the input transcriptome.fast5_dir
- directory with pass FAST5 files.fastq_dir
- directory with the fastq files.summary_dir
- directory with the sequencing summary files.spikein_fasta
- (optional) fasta file with spike-inf on known poly(A) tails length. The sequence names must end in _<tail_length> (for example "_50").min_mapping_qual
- filter out reads with mapping quality less than this parameter.per_transcript_plots
- plot the distribution of estimated tails lengths for all transcript (true or false).threads
- number of threads to use for the analyses.
-
alignment/
:aligned_reads_sorted.bam
- sorted indexed alignment of reads to the transcriptome.
-
input/
:reads.fastq&ast
- concatenated input reads and nanopolish index files.reference.fas
- reference fasta (including spike-ins).summaries.fofn
- list of sequencing summary files.
-
reports/
:filtering_report.pdf
andfiltering_report.tsv
- nanopolish QC statistics.spikein_medians.tsv
- expected and estimated medians of spike-ins.spikein_report.pdf
- plots of distribution of tail lengths in spike-ins.tails_report.pdf
- global and per-transcript poly(A) tail length distributions.
-
tails/
:all_tails.tsv
- raw nanopolish output.filtered_tails.tsv
- nanopolish output - PASS reads only.spikein_tails.tsv
- results for reads mapping to spike-ins.
- miniconda - install it according to the instructions.
- snakemake install using
conda
. - The rest of the dependencies are automatically installed using the
conda
feature ofsnakemake
.
README.md
Snakefile
- master snakefileconfig.yml
- YAML configuration filesnakelib/
- snakefiles collection included by the master snakefilelib/
- python files included by analysis scripts and snakefilesscripts/
- analysis scriptsdata/
- input data needed by pipeline - use with caution to avoid bloated reporesults/
- pipeline results to be commited - use with caution to avoid bloated repo
Clone the repository:
git clone https://github.com/nanoporetech/pipeline-polya-ng
Edit config.yml
to set the input datasets and parameters then issue:
snakemake --use-conda -j <num_cores> all
(c) 2019 Oxford Nanopore Technologies Ltd.
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.