Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow for QIIME2 classification of 16S reads #439

Merged
merged 90 commits into from
Dec 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
9186c14
Notes/not
golu099 Sep 15, 2022
a101354
Notes/notes_main.txt
golu099 Sep 15, 2022
c15c0a0
whitespace changes
golu099 Sep 15, 2022
c53dfa0
testing
golu099 Oct 17, 2022
86c691e
qiime_workflow.wdl
golu099 Oct 17, 2022
5830180
test_ATCC_high.bam
golu099 Oct 17, 2022
eee96e8
stop file trasnfer
golu099 Oct 17, 2022
a1ea263
Introducing new WDLs edits
Nov 10, 2022
c52fffb
.
golu099 Nov 13, 2022
0319279
.
golu099 Nov 13, 2022
684320b
test
golu099 Nov 13, 2022
7a6d980
.
golu099 Nov 13, 2022
4567242
.
golu099 Nov 13, 2022
ad024f2
.
golu099 Nov 13, 2022
fca086c
.
golu099 Nov 13, 2022
f48a990
.
golu099 Nov 13, 2022
acc8ebe
updated 12:13 PM 11.14.22
golu099 Nov 14, 2022
4597846
CONDA env dep issue
golu099 Nov 14, 2022
ae770ed
updated to test CONDA env bug
golu099 Nov 14, 2022
0afd54a
fixed CONDA env issue
golu099 Nov 14, 2022
f9a7863
debug
golu099 Nov 14, 2022
916f379
updated bugs
golu099 Nov 14, 2022
050f5ca
update
golu099 Nov 14, 2022
37682b3
Add files via upload
golu099 Nov 14, 2022
bc32133
test_see
golu099 Nov 14, 2022
9660e50
Merge branch 'fnegrete_test' of https://github.com/broadinstitute/vir…
golu099 Nov 14, 2022
648b9ea
passed miniWDL
golu099 Nov 15, 2022
96f57f9
Merge remote-tracking branch 'origin/master' into fnegrete_test
golu099 Nov 15, 2022
7058a03
final
golu099 Nov 15, 2022
d53a4fa
Reviewed per Chris feedback
golu099 Nov 15, 2022
1ab2c66
edits
golu099 Nov 16, 2022
e2cbf85
Deleted extra files from git repo
golu099 Nov 16, 2022
eee8c36
Fixing issue on tasks.wdl
golu099 Nov 16, 2022
ba655c0
Fixing bugs on task.wdl
golu099 Nov 16, 2022
7d75301
Updated
golu099 Nov 16, 2022
56b29e1
Updated
golu099 Nov 16, 2022
56529e0
Optimizing updates
golu099 Nov 16, 2022
0fb8686
Optimizing updates
golu099 Nov 16, 2022
9743496
Optimizing updates
golu099 Nov 16, 2022
c9dd4e9
updating task file
golu099 Nov 16, 2022
6adec92
change rbracket per feedback
golu099 Nov 16, 2022
0d9eb2a
change rbracket per feedback
golu099 Nov 16, 2022
b1f71c5
changed workflow name to match wdl
golu099 Nov 16, 2022
547c8e6
changed name per wormtool test
golu099 Nov 16, 2022
a657a1f
changed name per wormtool test
golu099 Nov 16, 2022
bd2d8cc
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Nov 17, 2022
4342498
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Nov 17, 2022
b0c7d08
Addressing feedback
golu099 Nov 17, 2022
052ef39
Feedback edits
golu099 Nov 17, 2022
b58bc89
Fix commit issue
golu099 Nov 17, 2022
fe08748
Changing description for trim reads per feedback
golu099 Nov 17, 2022
2e94b27
Spacing issues
golu099 Nov 17, 2022
fdae7d1
Adding version 1.0 fix
golu099 Nov 17, 2022
62ee431
Fixing miniwdl testing error on local.json
golu099 Nov 17, 2022
4b75683
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Nov 18, 2022
fbe4a7c
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Nov 18, 2022
2f506d8
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Nov 18, 2022
25800f0
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Nov 19, 2022
29e5a82
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Nov 19, 2022
8206896
Making edits per feedback
golu099 Nov 19, 2022
d613912
local disk edits
golu099 Nov 19, 2022
2967780
Changes to rep-seq files
golu099 Nov 21, 2022
638f4a2
spacing issue fixed
golu099 Nov 21, 2022
7f1c8e2
Better name for outfile
golu099 Nov 30, 2022
e5aa74c
edits
golu099 Dec 1, 2022
66a24c0
Merge branch 'fnegrete_test' of https://github.com/broadinstitute/vir…
golu099 Dec 1, 2022
c55812c
Merge branch 'master' into fnegrete_test
dpark01 Dec 1, 2022
0e463df
Merge branch 'master' into fnegrete_test
dpark01 Dec 2, 2022
a9d9f0a
conda env activation bash
golu099 Dec 7, 2022
d19f499
config shell env
golu099 Dec 8, 2022
2259d6f
shell verbose
golu099 Dec 8, 2022
003ff11
adding bash source
golu099 Dec 8, 2022
cadf667
adding bash source
golu099 Dec 8, 2022
5feafe2
restoring working model to obtain error code on terra
golu099 Dec 12, 2022
9f21841
Merge branch 'master' into fnegrete_test
dpark01 Dec 13, 2022
fcf16b6
Merge branch 'master' into fnegrete_test
dpark01 Dec 14, 2022
db8f398
add input files
dpark01 Dec 14, 2022
779386d
Adding new mini workflow for qiime: qiime_import_bam. Precursor to mi…
golu099 Dec 14, 2022
bd934ec
Changes to qiime_import_bam.wdl
golu099 Dec 14, 2022
748ed8a
calls name may not equal the containing workflow's
golu099 Dec 14, 2022
922ef44
changing wfl name to infile per miniwdl request
golu099 Dec 14, 2022
60d6619
Call's name may not equal the containing workflow's
golu099 Dec 14, 2022
6901f4b
chanigng test_sample_name
golu099 Dec 15, 2022
85a3aa5
.
golu099 Dec 15, 2022
ae6e723
Removing /tasks_qiime_import_bam.wdl since redundant
golu099 Dec 19, 2022
b431491
Merge branch 'master' into fnegrete_test
dpark01 Dec 19, 2022
68edbd6
Updated docker image
golu099 Dec 20, 2022
d77f8dc
Merge branch 'master' into fnegrete_test
dpark01 Dec 21, 2022
ad1b21d
removing CONDA env // command block
golu099 Dec 21, 2022
7b809b0
RAM & Local disk updated for sufficient memory
golu099 Dec 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -349,3 +349,13 @@ workflows:
primaryDescriptorPath: /pipes/WDL/workflows/trimal.wdl
testParameterFiles:
- empty.json
- name: amplicon16S_analysis
subclass: WDL
primaryDescriptorPath: /pipes/WDL/workflows/amplicon16S_analysis.wdl
testParameterFiles:
- empty.json
- name: qiime_import_bam
subclass: WDL
primaryDescriptorPath: /pipes/WDL/workflows/qiime_import_bam.wdl
testParameterFiles:
- empty.json
295 changes: 295 additions & 0 deletions pipes/WDL/tasks/tasks_16S_amplicon.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,295 @@
version 1.0

task qiime_import_from_bam {
meta {
description: "Parsing demultiplexed fastq BAM files into qiime readable files."
}
input {
File reads_bam
String sample_name
Int memory_mb = 2000
golu099 marked this conversation as resolved.
Show resolved Hide resolved
Int cpu = 1
Int disk_size_gb = ceil(2*size(reads_bam, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2:conda"
}
parameter_meta {
reads_bam: {description: "Input BAM file"}
}

command <<<
set -ex -o pipefail
#Part 1A | BAM -> FASTQ [Simple samtools command]
samtools fastq -1 $(pwd)/R1.fastq.gz -2 $(pwd)/R2.fastq.gz -0 /dev/null ~{reads_bam}
#making new bash variable | regex: (_) -> (-)
NEWSAMPLENAME=$(echo "~{sample_name}" | perl -lape 's/[_]/-/g')
#All names added to one giant file
echo ${NEWSAMPLENAME} > NEWSAMPLENAME.txt
#Make a manifest.txt that contains [1.sampleid 2.R1_fastq 3.R2.fastq]
#> =overwrite or writes new file
echo -e "sample-id\tforward-absolute-filepath\treverse-absolute-filepath" > manifest.tsv
#>>= appends
#\t= tabs next value
echo -e "$NEWSAMPLENAME\t$(pwd)/R1.fastq.gz\t$(pwd)/R2.fastq.gz" >> manifest.tsv

#fastq -> bam (provided by qiime tools import fxn)
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.tsv \
--input-format PairedEndFastqManifestPhred33V2 \
--output-path "~{sample_name}.qza"
>>>

output {
File reads_qza = "~{sample_name}.qza"
String samplename_master_sheet = read_string("NEWSAMPLENAME.txt")
}
runtime {
golu099 marked this conversation as resolved.
Show resolved Hide resolved
docker: docker
memory: "${memory_mb} MiB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
}
}

#Part 1 | Step 2:cutadapt: Trim sequences
#trimreads_trim
#trim = default
task trim_reads {

meta {
description:"Removing adapter sequences, primers, and other unwanted sequence from sequence data."
}

input {
File reads_qza

String qza_basename = basename(reads_qza, '.qza')
#Boolean not_default = false
String forward_adapter = "CTGCTGCCTCCCGTAGGAGT"
String reverse_adapter = "AGAGTTTGATCCTGGCTCAG"
Int min_length = 1
dpark01 marked this conversation as resolved.
Show resolved Hide resolved
Boolean keep_untrimmed_reads = false
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(reads_qza, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2:conda"
}

command <<<
set -ex -o pipefail
qiime cutadapt trim-paired \
--i-demultiplexed-sequences "~{reads_qza}" \
--p-front-f "~{forward_adapter}" \
--p-front-r "~{reverse_adapter}" \
~{"--p-minimum-length " + min_length} \
~{true='--p-no-discard-untrimmed' false='--p-discard-untrimmed' keep_untrimmed_reads} \
--o-trimmed-sequences "~{qza_basename}_trimmed.qza"

#trim_visual
qiime demux summarize \
--i-data "~{qza_basename}_trimmed.qza" \
--o-visualization "~{qza_basename}_trim_summary.qzv"
>>>

output {
#trimmed_sequences = paired ends for vsearch
File trimmed_reads_qza = "~{qza_basename}_trimmed.qza"
File trimmed_visualization = "~{qza_basename}_trim_summary.qzv"
}

runtime {
docker: docker
memory: "${memory_mb} MiB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
}
}

#Part 1 | Step 3:VSEARCH: Merge sequences
task join_paired_ends {
meta {
description: "Join paired-end sequence reads using vseach's merge_pairs function."
}
input {
#Input File: Merge paired reads
File trimmed_reads_qza
String reads_basename = basename(trimmed_reads_qza, '.qza')
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(trimmed_reads_qza, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2:conda"
}

command <<<
set -ex -o pipefail
qiime vsearch join-pairs \
--i-demultiplexed-seqs ~{trimmed_reads_qza} \
--o-joined-sequences "~{reads_basename}_joined.qza"

qiime demux summarize \
--i-data "~{reads_basename}_joined.qza" \
--o-visualization "~{reads_basename}_visualization.qzv"
>>>
output {
File joined_end_reads_qza = "~{reads_basename}_joined.qza"
File joined_end_visualization = "~{reads_basename}_visualization.qzv"
}
runtime {
docker: docker
memory: "${memory_mb} MiB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
}
}

task deblur {

meta {
description: "Perform sequence quality control for Illumina data using the Deblur workflow with a 16S reference as a positive filter."
}
input {
File joined_end_reads_qza
String joined_end_basename = basename(joined_end_reads_qza, '.qza')
Int trim_length_var = 300
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(joined_end_reads_qza, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2:conda"
}
command <<<
set -ex -o pipefail

qiime deblur denoise-16S \
--i-demultiplexed-seqs ~{joined_end_reads_qza} \
~{"--p-trim-length " + trim_length_var} \
--p-sample-stats \
--o-representative-sequences "~{joined_end_basename}_rep_seqs.qza" \
--o-table "~{joined_end_basename}_table.qza" \
--o-stats "~{joined_end_basename}_stats.qza"

#Generate feature table- give you the number of features per sample
qiime feature-table summarize \
--i-table "~{joined_end_basename}_table.qza" \
--o-visualization "~{joined_end_basename}_table.qzv"
#Generate visualization of deblur stats
qiime deblur visualize-stats \
--i-deblur-stats "~{joined_end_basename}_stats.qza" \
--o-visualization "~{joined_end_basename}_stats.qzv"
>>>
output {
File representative_seqs_qza = "~{joined_end_basename}_rep_seqs.qza"
File representative_table_qza = "~{joined_end_basename}_table.qza"
File feature_table = "~{joined_end_basename}_table.qzv"
File visualize_stats = "~{joined_end_basename}_stats.qzv"

}
runtime {
docker: docker
memory: "${memory_mb} MiB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
}
}
task train_classifier {
meta {
descrription: " Upload a classidier trained to classify v1-2 amplicon sequences"
}
input {
File otu_ref
File taxanomy_ref
String forward_adapter
String reverse_adapter
Int min_length = 100
Int max_length = 500
Comment on lines +206 to +207
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were made to be mandatory but doesn't seem like they're used that way below?

Is 100/500 our/Zoe's custom default, or is it the tool's default if we don't specify the parameter?

String otu_basename = basename(otu_ref, '.qza')
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(otu_ref, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2:conda"
}
command <<<
set -ex -o pipefail
CONDA_ENV_NAME=$(conda info --envs -q | awk -F" " '/qiime.*/{ print $1 }')
conda activate ${CONDA_ENV_NAME}

qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path ~{otu_ref} \
--output-path "~{otu_basename}_seqs.qza"

qiime tools import \
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat \
--input-path ~{taxanomy_ref} \
--output-path "~{otu_basename}_tax.qza"

qiime feature-classifier extract-reads\
--i-sequeunces "~{otu_basename}_seqs.qza"\
--p-f-primer "~{forward_adapter}" \
--p-r-primer "~{reverse_adapter}" \
~{"--p-min-length " + min_length} \
~{"--p-max-length " + max_length} \
--o-reads "~{otu_basename}_v1-2-ref-seqs.qza"

qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads "~{otu_basename}_v1-2-ref-seqs.qza" \
--i-reference-taxonomy "~{otu_basename}_tax.qza" \
--o-classifier "~{otu_basename}_v1-2-classifier.qza"
>>>
output {
File trained_classifier = "~{otu_basename}_v1-2-classifier.qza"
}
runtime {
docker: docker
memory: "${memory_mb} MiB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
}
}
task tax_analysis {
meta {
description: "Protocol describes performing a taxonomic classification with a naive bayes classifier that has been trained on the V1-2 regions amplified by our primers."
}
input {
File trained_classifier
File representative_seqs_qza
File representative_table_qza
String basename = basename(trained_classifier, '.qza')
golu099 marked this conversation as resolved.
Show resolved Hide resolved
Int memory_mb = 5
Int cpu = 1
Int disk_size_gb = 375
String docker = "quay.io/broadinstitute/qiime2:conda"
}
command <<<
set -ex -o pipefail
qiime feature-classifier classify-sklearn \
--i-classifier ~{trained_classifier} \
--i-reads ~{representative_seqs_qza} \
--o-classification "~{basename}_tax.qza"

qiime feature-table tabulate-seqs \
--i-data ~{representative_seqs_qza} \
--o-visualization "~{basename}_rep_seqs.qzv"

qiime taxa barplot \
--i-table ~{representative_table_qza} \
--i-taxonomy "~{basename}_tax.qza" \
--o-visualization "~{basename}_bar_plots.qzv"
>>>
output {
File rep_seq_list = "~{basename}_rep_seqs.qzv"
File tax_classification_graph = "~{basename}_bar_plots.qzv"
}
runtime {
docker: docker
memory: "7 GB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
}
}
48 changes: 48 additions & 0 deletions pipes/WDL/workflows/amplicon16S_analysis.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
version 1.0

import "../tasks/tasks_16S_amplicon.wdl" as qiime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are importing a WDL task file as qiime then maybe that file should be renamed to tasks_qiime.wdl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should keep the 16S_amplicon so i changed qiime --> 16S_amplicon. Let me know if you agree with this decision, if not maybe then I can change the tasks & workflow . wdl --> tasks_qiime && qiime_worfklow.wdl?


workflow amplicon16S_analysis {

meta {
description: "Running 16S amplicon (from BAM format) sequencing analysis with qiime."
author: "fnegrete"
email: "viral_ngs@broadinstitute.org"
allowNestedInputs: true
}
input {
File reads_bam
File trained_classifier
String sample_name
Boolean keep_untrimmed_reads
}

call qiime.qiime_import_from_bam {
input:
reads_bam = reads_bam,
sample_name = sample_name
}
#__________________________________________
call qiime.trim_reads {
input:
reads_qza = qiime_import_from_bam.reads_qza,
keep_untrimmed_reads = keep_untrimmed_reads
}
#__________________________________________
call qiime.join_paired_ends {
input:
trimmed_reads_qza = trim_reads.trimmed_reads_qza
}
#_________________________________________
call qiime.deblur {
input:
joined_end_reads_qza = join_paired_ends.joined_end_reads_qza
}
#_________________________________________
call qiime.tax_analysis {
input:
trained_classifier = trained_classifier,
representative_seqs_qza = deblur.representative_seqs_qza,
representative_table_qza = deblur.representative_table_qza
}
}
23 changes: 23 additions & 0 deletions pipes/WDL/workflows/qiime_import_bam.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
version 1.0

import "../tasks/tasks_16S_amplicon.wdl" as infile

workflow qiime_import_bam {

meta{
description: "Importing BAM files into QIIME"
author: "fnegrete"
email: "viral_ngs@broadinstitute.org"
allowNestedInputs: true
}
input {
File reads_bam
String sample_name
}

call infile.qiime_import_from_bam {
input:
reads_bam = reads_bam,
sample_name = sample_name
}
}
4 changes: 4 additions & 0 deletions test/input/WDL/test_inputs-qiime_import_bam-local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"qiime_import_bam.reads_bam": "test/input/G5012.3.subset.bam",
"qiime_import_bam.sample_name": "G5012.3.subset.bam"
}