Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update modules in GATK's gcnvcaller pipeline #3561

Merged
merged 3 commits into from
Jun 28, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions modules/nf-core/gatk4/collectreadcounts/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ process GATK4_COLLECTREADCOUNTS {

input:
tuple val(meta), path(input), path(input_index), path(intervals)
path(fasta)
path(fai)
path(dict)
tuple val(meta2), path(fasta)
tuple val(meta2), path(fai)
tuple val(meta2), path(dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta2), path(fai)
tuple val(meta2), path(dict)
tuple val(meta3), path(fai)
tuple val(meta4), path(dict)


output:
tuple val(meta), path("*.hdf5"), optional: true, emit: hdf5
Expand Down
9 changes: 7 additions & 2 deletions modules/nf-core/gatk4/collectreadcounts/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,16 @@ input:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
- meta2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the other metas too?

type: map
description: |
Groovy Map containing reference information
e.g. [ id:'test' ]
- input:
type: file
description: BAM/CRAM/SAM file
pattern: "*.{bam,cram,sam}"
- bai:
- input_index:
type: file
description: BAM/CRAM/SAM index file
pattern: "*.{bai,crai,sai}"
Expand Down
36 changes: 14 additions & 22 deletions modules/nf-core/gatk4/determinegermlinecontigploidy/main.nf
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@

process GATK4_DETERMINEGERMLINECONTIGPLOIDY {
tag "$meta.id"
label 'process_single'

//Conda is not supported at the moment: https://github.com/broadinstitute/gatk/issues/7811
container "nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
container "quay.io/nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container "quay.io/nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
container "nf-core/gatk:4.4.0.0" //Biocontainers is missing a package

Quay.io is the default registry in all nf-core pipelines so we leave this out for more flexibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried without it, singularity fails to pull the image. https://github.com/nf-core/modules/actions/runs/5394391319/jobs/9795497440

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamrtalbot can you help with this? :)

Copy link
Contributor

@adamrtalbot adamrtalbot Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixed in nf-core/tools#2336 but will require a new release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed with Nextflow version 23.04+ which includes singularity.registry, which is set to quay.io in the NF-Core template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I will let quay.io be a part of the uri for now, but I will remove it after the next tools release 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caused issue: #3668


// Exit if running this module with -profile conda / -profile mamba
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
Expand All @@ -12,27 +13,25 @@ process GATK4_DETERMINEGERMLINECONTIGPLOIDY {

input:
tuple val(meta), path(counts), path(bed), path(exclude_beds)
tuple val(meta2), path(ploidy_model)
path(contig_ploidy_table)
path(ploidy_model)

output:
tuple val(meta), path("*-calls.tar.gz") , emit: calls
tuple val(meta), path("*-model.tar.gz") , emit: model, optional: true
tuple val(meta), path("${prefix}-calls"), emit: calls
tuple val(meta), path("${prefix}-model"), emit: model, optional: true
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def input_list = counts.collect(){"--input $it"}.join(" ")
def intervals = bed ? "--intervals ${bed}" : ""
def exclude = exclude_beds ? exclude_beds.collect(){"--exclude-intervals $it"}.join(" ") : ""
def untar_model = ploidy_model ? (ploidy_model.name.endsWith(".tar.gz") ? "tar -xzf ${ploidy_model}" : "") : ""
def tar_model = ploidy_model ? "" : "tar czf ${prefix}-model.tar.gz ${prefix}-model"
def model = ploidy_model ? (ploidy_model.name.endsWith(".tar.gz") ? "--model ${ploidy_model.toString().replace(".tar.gz","")}" : "--model ${ploidy_model}") : ""
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def intervals = bed ? "--intervals ${bed}" : ""
def exclude = exclude_beds ? exclude_beds.collect(){"--exclude-intervals $it"}.join(" ") : ""
def contig_ploidy = contig_ploidy_table ? "--contig-ploidy-priors ${contig_ploidy_table}" : ""
def model = ploidy_model ? "--model ${ploidy_model}" : ""
def input_list = counts.collect(){"--input $it"}.join(" ")

def avail_mem = 3072
if (!task.memory) {
Expand All @@ -41,8 +40,6 @@ process GATK4_DETERMINEGERMLINECONTIGPLOIDY {
avail_mem = (task.memory.mega*0.8).intValue()
}
"""
${untar_model}

gatk --java-options "-Xmx${avail_mem}M" DetermineGermlineContigPloidy \\
${input_list} \\
--output ./ \\
Expand All @@ -54,22 +51,17 @@ process GATK4_DETERMINEGERMLINECONTIGPLOIDY {
--tmp-dir . \\
${args}

tar czf ${prefix}-calls.tar.gz ${prefix}-calls
${tar_model}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
gatk4: \$(echo \$(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*\$//')
END_VERSIONS
"""

stub:
def prefix = task.ext.prefix ?: "${meta.id}"
prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}-calls.tar.gz
touch ${prefix}-model.tar.gz
touch ${prefix}.tsv
touch ${prefix}2.tsv
touch ${prefix}-calls
touch ${prefix}-model

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
12 changes: 8 additions & 4 deletions modules/nf-core/gatk4/determinegermlinecontigploidy/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ input:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- meta2:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- counts:
type: file
description: One or more count TSV files created with gatk/collectreadcounts
Expand All @@ -43,8 +48,7 @@ input:
description: |
Optional - A folder containing the ploidy model.
When a model is supplied to tool will run in CASE mode.
The folder can be tar-zipped.
pattern: "*(.tar.gz)?"
pattern: '*-model/'

output:
- meta:
Expand All @@ -59,13 +63,13 @@ output:
- calls:
type: directory
description: A folder containing the calls from the input files
pattern: "*.tar.gz"
pattern: "*-calls/"
- model:
type: directory
description: |
A folder containing the model from the input files.
This will only be created in COHORT mode (when no model is supplied to the process).
pattern: "*.tar.gz"
pattern: "*-model/"

authors:
- "@nvnieuwk"
34 changes: 14 additions & 20 deletions modules/nf-core/gatk4/germlinecnvcaller/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ process GATK4_GERMLINECNVCALLER {
label 'process_single'

//Conda is not supported at the moment: https://github.com/broadinstitute/gatk/issues/7811
container "nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
container "quay.io/nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container "quay.io/nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
container "nf-core/gatk:4.4.0.0" //Biocontainers is missing a package


// Exit if running this module with -profile conda / -profile mamba
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
Expand All @@ -12,28 +12,25 @@ process GATK4_GERMLINECNVCALLER {

input:
tuple val(meta), path(tsv), path(intervals)
path model
path ploidy
tuple val(meta2), path(model)
tuple val(meta2), path(ploidy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta2), path(ploidy)
tuple val(meta3), path(ploidy)

Can you also fix the meta.yml here?


output:
tuple val(meta), path("*-cnv-calls.tar.gz"), emit: calls, optional: true
tuple val(meta), path("*-cnv-model.tar.gz"), emit: model, optional: true
path "versions.yml" , emit: versions
tuple val(meta), path("*-cnv-calls/*-calls"), emit: calls, optional: true
tuple val(meta), path("*-cnv-model/*-model"), emit: model, optional: true
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def intervals_command = intervals ? "--intervals $intervals" : ""
def untar_ploidy = ploidy ? (ploidy.name.endsWith(".tar.gz") ? "tar -xzf ${ploidy}" : "") : ""
def untar_model = model ? (model.name.endsWith(".tar.gz") ? "tar -xzf ${model}" : "") : ""
def ploidy_command = ploidy ? (ploidy.name.endsWith(".tar.gz") ? "--contig-ploidy-calls ${ploidy.toString().replace(".tar.gz","")}" : "--contig-ploidy-calls ${ploidy}") : ""
def model_command = model ? (model.name.endsWith(".tar.gz") ? "--model ${model.toString().replace(".tar.gz","")}/${prefix}-model" : "--model ${model}/${prefix}-model") : ""
def input_list = tsv.collect{"--input $it"}.join(' ')
def output_command = model ? "--output ${prefix}-cnv-calls" : "--output ${prefix}-cnv-model"
def tar_output = model ? "tar -czf ${prefix}-cnv-calls.tar.gz ${prefix}-cnv-calls" : "tar -czf ${prefix}-cnv-model.tar.gz ${prefix}-cnv-model"
def intervals_command = intervals ? "--intervals ${intervals}" : ""
def ploidy_command = ploidy ? "--contig-ploidy-calls ${ploidy}" : ""
def model_command = model ? "--model ${model}" : ""
def input_list = tsv.collect{"--input $it"}.join(' ')
def output_command = model ? "--output ${prefix}-cnv-calls" : "--output ${prefix}-cnv-model"

def avail_mem = 3072
if (!task.memory) {
Expand All @@ -42,9 +39,6 @@ process GATK4_GERMLINECNVCALLER {
avail_mem = (task.memory.mega*0.8).intValue()
}
"""
${untar_ploidy}
${untar_model}

gatk --java-options "-Xmx${avail_mem}g" GermlineCNVCaller \\
$input_list \\
$ploidy_command \\
Expand All @@ -53,7 +47,6 @@ process GATK4_GERMLINECNVCALLER {
$args \\
$intervals_command \\
$model_command
${tar_output}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand All @@ -64,7 +57,8 @@ process GATK4_GERMLINECNVCALLER {
stub:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.tar.gz
mkdir -p ${prefix}-cnv-calls/${prefix}-calls
mkdir -p ${prefix}-cnv-model/${prefix}-model

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
17 changes: 11 additions & 6 deletions modules/nf-core/gatk4/germlinecnvcaller/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ input:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- meta2:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- tsv:
type: file
description: One or more count TSV files created with gatk/collectreadcounts
Expand All @@ -31,12 +36,12 @@ input:
pattern: "*.bed"
- model:
type: directory
description: Optional - Tar gzipped directory containing the model produced by germlinecnvcaller cohort mode
pattern: "*.tar.gz"
description: Optional - directory containing the model produced by germlinecnvcaller cohort mode
pattern: "*-cnv-model/*-model"
- ploidy:
type: file
description: Tar gzipped directory containing ploidy calls produced by determinegermlinecontigploidy case or cohort mode
pattern: "*.tar.gz"
description: directory containing ploidy calls produced by determinegermlinecontigploidy case or cohort mode
pattern: "*-calls"

output:
- meta:
Expand All @@ -51,11 +56,11 @@ output:
- calls:
type: file
description: Tar gzipped directory containing calls produced by germlinecnvcaller case mode
pattern: "*.tar"
pattern: "*-cnv-calls/*-calls"
- model:
type: directory
description: Optional - Tar gzipped directory containing the model produced by germlinecnvcaller cohort mode
pattern: "*.tar.gz"
pattern: "*-cnv-model/*-model"

authors:
- "@ryanjameskennedy"
Expand Down
23 changes: 8 additions & 15 deletions modules/nf-core/gatk4/postprocessgermlinecnvcalls/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,33 @@ process GATK4_POSTPROCESSGERMLINECNVCALLS {
label 'process_single'

//Conda is not supported at the moment: https://github.com/broadinstitute/gatk/issues/7811
container "nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
container "quay.io/nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container "quay.io/nf-core/gatk:4.4.0.0" //Biocontainers is missing a package
container "nf-core/gatk:4.4.0.0" //Biocontainers is missing a package


// Exit if running this module with -profile conda / -profile mamba
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "GATK4_POSTPROCESSGERMLINECNVCALLS module does not support Conda. Please use Docker / Singularity / Podman instead."
}

input:
tuple val(meta), path(ploidy)
path model
path calls
tuple val(meta), path(ploidy, stageAs:'ploidy')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use stageAs here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ploidy and calls are generated by other modules upstream and GATK attaches the same suffix (-calls) to their names, and if someone running the cnvcalling workflow doesn't customize the prefixes the names will clash. That's why I have used stageAs here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah we had long discussions about this in the past and decided to not use stageAs in these cases and force the user to use different prefixes (which is a best practice to do anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.. Alright I will change it 😄

tuple val(meta2), path(model)
tuple val(meta2), path(calls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta2), path(calls)
tuple val(meta3), path(calls)

Same here


output:
tuple val(meta), path("*_genotyped_intervals.vcf.gz") , emit: intervals, optional: true
tuple val(meta), path("*_genotyped_segments.vcf.gz") , emit: segments, optional: true
tuple val(meta), path("*_denoised.vcf.gz") , emit: denoised, optional: true
path "versions.yml" , emit: versions
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def untar_ploidy = ploidy ? (ploidy.name.endsWith(".tar.gz") ? "tar -xzf ${ploidy}" : "") : ""
def untar_model = model ? (model.name.endsWith(".tar.gz") ? "tar -xzf ${model}" : "") : ""
def untar_calls = calls ? (calls.name.endsWith(".tar.gz") ? "tar -xzf ${calls}" : "") : ""
def ploidy_command = ploidy ? (ploidy.name.endsWith(".tar.gz") ? "--contig-ploidy-calls ${ploidy.toString().replace(".tar.gz","")}" : "--contig-ploidy-calls ${ploidy}") : ""
def model_command = model ? (model.name.endsWith(".tar.gz") ? "--model-shard-path ${model.toString().replace(".tar.gz","")}/${prefix}-model" : "--model-shard-path ${model}/${prefix}-model") : ""
def calls_command = calls ? (calls.name.endsWith(".tar.gz") ? "--calls-shard-path ${calls.toString().replace(".tar.gz","")}/${prefix}-calls" : "--calls-shard-path ${model}/${prefix}-calls") : ""
def ploidy_command = ploidy ? "--contig-ploidy-calls ${ploidy}" : ""
def model_command = model ? "--model-shard-path ${model}" : ""
def calls_command = calls ? "--calls-shard-path ${calls}" : ""

def avail_mem = 3072
if (!task.memory) {
Expand All @@ -41,10 +38,6 @@ process GATK4_POSTPROCESSGERMLINECNVCALLS {
avail_mem = (task.memory.mega*0.8).intValue()
}
"""
${untar_ploidy}
${untar_model}
${untar_calls}

gatk --java-options "-Xmx${avail_mem}g" PostprocessGermlineCNVCalls \\
$ploidy_command \\
$model_command \\
Expand Down
12 changes: 8 additions & 4 deletions modules/nf-core/gatk4/postprocessgermlinecnvcalls/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,23 +21,27 @@ input:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- meta2:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- ploidy:
type: directory
description: |
Optional - A folder containing the ploidy model.
When a model is supplied to tool will run in CASE mode.
The folder can be tar-zipped.
pattern: "*.tar.gz"
pattern: "*-calls/"
- calls:
type: directory
description: A folder containing the calls from the input files
pattern: "*.tar.gz"
pattern: "*-cnv-calls/*-calls"
- model:
type: directory
description: |
A folder containing the model from the input files.
This will only be created in COHORT mode (when no model is supplied to the process).
pattern: "*.tar.gz"
pattern: "*-cnv-model/*-model"

output:
- meta:
Expand Down
Loading