Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run barrnap with fasta #535

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#519](https://github.com/nf-core/ampliseq/pull/519) - Adding the pipeline reference to the MultiQC report
- [#520](https://github.com/nf-core/ampliseq/pull/520),[#530](https://github.com/nf-core/ampliseq/pull/530) - Fix conda packages
- [#531](https://github.com/nf-core/ampliseq/pull/531) - Update documentation
- [#535](https://github.com/nf-core/ampliseq/pull/535) - Make sure barrnap runs with fasta input

### `Dependencies`

Expand Down
3 changes: 2 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,8 @@ An [example samplesheet](../assets/samplesheet.tsv) has been provided with the p

#### ASV/OTU fasta input

When pointing at a file ending with `.fasta`, `.fna` or `.fa`, the containing ASV/OTU sequences will be taxonomically classified. All other pipeline steps will be skipped.
When pointing at a file ending with `.fasta`, `.fna` or `.fa`, the containing ASV/OTU sequences will be taxonomically classified.
Most of the steps of the pipeline will be skipped, but ITSx & Barrnap & length filtering can be applied before taxonomic classification.

```bash
--input 'path/to/amplicon_sequences.fasta'
Expand Down
4 changes: 3 additions & 1 deletion modules/local/filter_len_asv.nf
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,16 @@ process FILTER_LEN_ASV {
script:
def min_len_asv = params.min_len_asv ?: '1'
def max_len_asv = params.max_len_asv ?: '1000000'

def read_table = table ? "table <- read.table(file = '$table', sep = '\t', comment.char = '', header=TRUE)" : "table <- data.frame(matrix(ncol = 1, nrow = 0))"
"""
#!/usr/bin/env Rscript

#load packages
suppressPackageStartupMessages(library(Biostrings))

#read abundance file, first column is ASV_ID
table <- read.table(file = "$table", sep = '\t', comment.char = "", header=TRUE)
$read_table
colnames(table)[1] <- "ASV_ID"

#read fasta file of ASV sequences
Expand Down
22 changes: 12 additions & 10 deletions workflows/ampliseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,6 @@ workflow AMPLISEQ {
//
PARSE_INPUT ( params.input, is_fasta_input, single_end, params.multiple_sequencing_runs, params.extension )
ch_reads = PARSE_INPUT.out.reads
ch_fasta = PARSE_INPUT.out.fasta

//
// MODULE: Rename files
Expand Down Expand Up @@ -305,29 +304,35 @@ workflow AMPLISEQ {
// Modules : Filter rRNA
// TODO: FILTER_SSU.out.stats needs to be merged still into "overall_summary.tsv"
//
if ( is_fasta_input ) {
ch_unfiltered_fasta = PARSE_INPUT.out.fasta
} else {
ch_unfiltered_fasta = DADA2_MERGE.out.fasta
}

if (!params.skip_barrnap && params.filter_ssu) {
BARRNAP ( DADA2_MERGE.out.fasta )
BARRNAP ( ch_unfiltered_fasta )
ch_versions = ch_versions.mix(BARRNAP.out.versions.ifEmpty(null))
FILTER_SSU ( DADA2_MERGE.out.fasta, DADA2_MERGE.out.asv, BARRNAP.out.matches )
MERGE_STATS_FILTERSSU ( ch_stats, FILTER_SSU.out.stats )
ch_stats = MERGE_STATS_FILTERSSU.out.tsv
ch_dada2_fasta = FILTER_SSU.out.fasta
ch_dada2_asv = FILTER_SSU.out.asv
} else if (!params.skip_barrnap && !params.filter_ssu) {
BARRNAP ( DADA2_MERGE.out.fasta )
BARRNAP ( ch_unfiltered_fasta )
ch_versions = ch_versions.mix(BARRNAP.out.versions.ifEmpty(null))
ch_dada2_fasta = DADA2_MERGE.out.fasta
ch_dada2_fasta = ch_unfiltered_fasta
ch_dada2_asv = DADA2_MERGE.out.asv
} else {
ch_dada2_fasta = DADA2_MERGE.out.fasta
ch_dada2_fasta = ch_unfiltered_fasta
ch_dada2_asv = DADA2_MERGE.out.asv
}

//
// Modules : amplicon length filtering
//
if (params.min_len_asv || params.max_len_asv) {
FILTER_LEN_ASV ( ch_dada2_fasta,ch_dada2_asv )
FILTER_LEN_ASV ( ch_dada2_fasta, ch_dada2_asv.ifEmpty( [] ) )
ch_versions = ch_versions.mix(FILTER_LEN_ASV.out.versions.ifEmpty(null))
MERGE_STATS_FILTERLENASV ( ch_stats, FILTER_LEN_ASV.out.stats )
ch_stats = MERGE_STATS_FILTERLENASV.out.tsv
Expand All @@ -338,10 +343,7 @@ workflow AMPLISEQ {
//
// SUBWORKFLOW / MODULES : Taxonomic classification with DADA2 and/or QIIME2
//
//Alternative entry point for fasta that is being classified
if ( !is_fasta_input ) {
ch_fasta = ch_dada2_fasta
}
ch_fasta = ch_dada2_fasta

//DADA2
if (!params.skip_taxonomy) {
Expand Down