Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool inputs #214

Merged
merged 4 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## CHAMPAGNE development version

- The CHAMPAGNE nextflow workflow now has a version entry in `nextflow.config`, in compliance with nf-core. (#213, @kelly-sovacool)
- Pool input (control) reads of the same sample name by default. Any inputs that should not be pooled must have different sample names in the samplesheet. (#214, @kelly-sovacool)

## CHAMPAGNE 0.4.0

Expand Down
24 changes: 12 additions & 12 deletions assets/samplesheet_human.csv
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
fastq_1,fastq_2,sample,rep,antibody,control
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129678.fastq.gz,,A549_CTCF,1,CTCF,A549_CTCF_INPUT_1
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129676.fastq.gz,,A549_CTCF,2,CTCF,A549_CTCF_INPUT_2
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129677.fastq.gz,,A549_CTCF,3,CTCF,A549_CTCF_INPUT_3
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129560.fastq.gz,,A549_CTCF_INPUT_1,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129561.fastq.gz,,A549_CTCF_INPUT_2,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129562.fastq.gz,,A549_CTCF_INPUT_3,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636612.fastq.gz,,A549_JUN,1,JUN,A549_JUN_INPUT_1
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636613.fastq.gz,,A549_JUN,2,JUN,A549_JUN_INPUT_2
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636614.fastq.gz,,A549_JUN,3,JUN,A549_JUN_INPUT_2
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638304.fastq.gz,,A549_JUN_INPUT_1,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638305.fastq.gz,,A549_JUN_INPUT_2,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638306.fastq.gz,,A549_JUN_INPUT_3,,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129678.fastq.gz,,A549_CTCF,1,CTCF,A549_CTCF_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129676.fastq.gz,,A549_CTCF,2,CTCF,A549_CTCF_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129677.fastq.gz,,A549_CTCF,3,CTCF,A549_CTCF_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129560.fastq.gz,,A549_CTCF_INPUT,1,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129561.fastq.gz,,A549_CTCF_INPUT,2,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR5129562.fastq.gz,,A549_CTCF_INPUT,3,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636612.fastq.gz,,A549_JUN,1,JUN,A549_JUN_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636613.fastq.gz,,A549_JUN,2,JUN,A549_JUN_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14636614.fastq.gz,,A549_JUN,3,JUN,A549_JUN_INPUT
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638304.fastq.gz,,A549_JUN_INPUT,1,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638305.fastq.gz,,A549_JUN_INPUT,2,,
/data/CCBR_Pipeliner/testdata/CHAMPAGNE/human/SRR14638306.fastq.gz,,A549_JUN_INPUT,3,,
12 changes: 6 additions & 6 deletions assets/samplesheet_test.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
sample,rep,fastq_1,fastq_2,antibody,control
SPT5_T0,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_2.fastq.gz,SPT5,SPT5_INPUT_1
SPT5_T0,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822154_1.fastq.gz,,SPT5,SPT5_INPUT_2
SPT5_T15,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_2.fastq.gz,SPT5,SPT5_INPUT_1
SPT5_T15,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822158_1.fastq.gz,,SPT5,SPT5_INPUT_2
SPT5_INPUT_1,,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R2.fastq.gz,,
SPT5_INPUT_2,,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,,,
SPT5_T0,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_2.fastq.gz,SPT5,SPT5_INPUT
SPT5_T0,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822154_1.fastq.gz,,SPT5,SPT5_INPUT
SPT5_T15,1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_2.fastq.gz,SPT5,SPT5_INPUT
SPT5_T15,2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822158_1.fastq.gz,,SPT5,SPT5_INPUT
SPT5_INPUT,1,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R2.fastq.gz,,
SPT5_INPUT,2,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R2.fastq.gz,,
15 changes: 10 additions & 5 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/usr/bin/env python3

"""
source: https://github.com/nf-core/chipseq/blob/51eba00b32885c4d0bec60db3cb0a45eb61e34c5/bin/check_samplesheet.py
adapted from: https://github.com/nf-core/chipseq/blob/51eba00b32885c4d0bec60db3cb0a45eb61e34c5/bin/check_samplesheet.py
"""

import collections
import os
import errno
import argparse
Expand Down Expand Up @@ -52,6 +52,7 @@ def check_samplesheet(file_in, file_out):
"""

sample_mapping_dict = {}
input_dict = collections.defaultdict(list)
with open(file_in, "r", encoding="utf-8-sig") as fin:
## Check header
MIN_COLS = 2
Expand Down Expand Up @@ -144,6 +145,7 @@ def check_samplesheet(file_in, file_out):
"Line",
line,
)
is_control_input = not antibody and not control

## Auto-detect paired-end/single-end
if not sample or not fastq_1:
Expand Down Expand Up @@ -172,7 +174,9 @@ def check_samplesheet(file_in, file_out):
print_error("Samplesheet contains duplicate rows!", "Line", line)
else:
sample_mapping_dict[sample].append(sample_info)
# pprint.pprint(sample_mapping_dict)
if is_control_input:
input_dict[sample_basename].append(sample_info)

## Write validated samplesheet with appropriate columns
if len(sample_mapping_dict) > 0:
out_dir = os.path.dirname(file_out)
Expand Down Expand Up @@ -205,11 +209,12 @@ def check_samplesheet(file_in, file_out):
sample,
)

# check that the control/input exists
for idx, val in enumerate(sample_mapping_dict[sample]):
control = val[-1]
if control and control not in sample_mapping_dict.keys():
if control and control not in input_dict.keys():
print_error(
f"Control identifier has to match a provided sample identifier!",
"Control identifier has to match a provided sample identifier!",
"Control",
control,
)
Expand Down
21 changes: 11 additions & 10 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ log.info """\
include { FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS as DOWNLOAD_FASTQ } from './subworkflows/nf-core/fastq_download_prefetch_fasterqdump_sratools'
include { INPUT_CHECK } from './subworkflows/local/input_check.nf'
include { PREPARE_GENOME } from './subworkflows/local/prepare_genome.nf'
include { POOL_INPUTS } from './subworkflows/local/pool_inputs/'
include { FILTER_BLACKLIST } from './subworkflows/CCBR/filter_blacklist/'
include { ALIGN_GENOME } from "./subworkflows/local/align.nf"
include { DEDUPLICATE } from "./subworkflows/local/deduplicate.nf"
Expand Down Expand Up @@ -74,28 +75,27 @@ workflow CHIPSEQ {
INPUT_CHECK(file(params.input, checkIfExists: true), params.seq_center, contrast_sheet)

INPUT_CHECK.out.reads.set { raw_fastqs }
raw_fastqs | CUTADAPT
CUTADAPT.out.reads.set{ trimmed_fastqs }
CUTADAPT(raw_fastqs).reads | POOL_INPUTS
trimmed_fastqs = POOL_INPUTS.out.reads

PREPARE_GENOME()
chrom_sizes = PREPARE_GENOME.out.chrom_sizes

effective_genome_size = PREPARE_GENOME.out.effective_genome_size

FILTER_BLACKLIST(trimmed_fastqs, PREPARE_GENOME.out.blacklist_index)
ALIGN_GENOME(FILTER_BLACKLIST.out.reads, PREPARE_GENOME.out.reference_index)
ALIGN_GENOME.out.bam.set{ aligned_bam }
aligned_bam = ALIGN_GENOME.out.bam

DEDUPLICATE(aligned_bam, chrom_sizes, effective_genome_size)
DEDUPLICATE.out.bam.set{ deduped_bam }
DEDUPLICATE.out.tag_align.set{ deduped_tagalign }
deduped_bam = DEDUPLICATE.out.bam
deduped_tagalign = DEDUPLICATE.out.tag_align

deduped_bam | PHANTOM_PEAKS
PHANTOM_PEAKS.out.fraglen | PPQT_PROCESS
PPQT_PROCESS.out.fraglen.set { frag_lengths }
PHANTOM_PEAKS(deduped_bam).fraglen | PPQT_PROCESS
frag_lengths = PPQT_PROCESS.out.fraglen

ch_multiqc = Channel.of()
if (params.run.qc) {
QC(raw_fastqs, trimmed_fastqs, FILTER_BLACKLIST.out.n_surviving_reads,
QC(raw_fastqs, CUTADAPT.out.reads, FILTER_BLACKLIST.out.n_surviving_reads,
aligned_bam, ALIGN_GENOME.out.aligned_flagstat, ALIGN_GENOME.out.filtered_flagstat,
deduped_bam, DEDUPLICATE.out.flagstat,
PHANTOM_PEAKS.out.spp, frag_lengths,
Expand Down Expand Up @@ -157,6 +157,7 @@ workflow CHIPSEQ {
)

}

}

if (!workflow.stubRun) {
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@
"git_sha": "8fc1d24c710ebe1d5de0f2447ec9439fd3d9d66a",
"installed_by": ["modules"]
},
"cat/cat": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"custom/sratoolsncbisettings": {
"branch": "master",
"git_sha": "20e78a9868eaa69c8cac91152397def32374b807",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/cat/cat/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

78 changes: 78 additions & 0 deletions modules/nf-core/cat/cat/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions modules/nf-core/cat/cat/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading