Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of oras from module images #415

Merged
merged 8 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.1.0dev - [date]
## v2.1.0 - [date]

### Enhancements & fixes

Expand Down Expand Up @@ -31,6 +31,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [[#347](https://github.com/nf-core/chipseq/issues/347)] - Add read group tag to bam files processed by bowtie2.
- [[PR #406](https://github.com/nf-core/chipseq/pull/406)] - Update metro map to show macs3 instead of macs2.
- [[#409](https://github.com/nf-core/chipseq/issues/409)] - Bulk modules and subworkflows update.
- [[PR #415](https://github.com/nf-core/chipseq/pull/415)] - Get rid of `oras` in modules.

### Software dependencies

Expand Down
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,17 +74,17 @@ You can find numerous talks on the [nf-core events page](https://nf-co.re/events

To run on your data, prepare a tab-separated samplesheet with your input data. Please follow the [documentation on samplesheets](https://nf-co.re/chipseq/usage#samplesheet-input) for more details. An example samplesheet for running the pipeline looks as follows:

```csv
sample,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP_REP1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP1
WT_BCATENIN_IP_REP2,BLA203A25_S16_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
WT_BCATENIN_IP_REP2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
WT_BCATENIN_IP_REP2,BLA203A25_S16_L003_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
WT_BCATENIN_IP_REP3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP3
WT_INPUT_REP1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT_REP2,BLA203A30_S21_L001_R1_001.fastq.gz,,,
WT_INPUT_REP2,BLA203A30_S21_L002_R1_001.fastq.gz,,,
WT_INPUT_REP3,BLA203A31_S21_L003_R1_001.fastq.gz,,,
```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,BLA203A25_S16_L001_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A25_S16_L002_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A25_S16_L003_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A49_S40_L001_R1_001.fastq.gz,,3,BCATENIN,WT_INPUT,3
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A30_S21_L002_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
```

Now, you can run the pipeline using:
Expand All @@ -96,8 +96,7 @@ nextflow run nf-core/chipseq --input samplesheet.csv --outdir <OUTDIR> --genome
See [usage docs](https://nf-co.re/chipseq/usage) for all of the available options when running the pipeline.

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see the [docs](https://nf-co.re/usage/configuration#custom-configuration-files) here.

For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/chipseq/usage) and the [parameter documentation](https://nf-co.re/chipseq/parameters).

Expand All @@ -113,7 +112,7 @@ These scripts were originally written by Chuan Wang ([@chuan-wang](https://githu

The pipeline workflow diagram was designe by Sarah Guinchard ([@G-Sarah](https://github.com/G-Sarah)).

Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k).
Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@bjlang](https://github.com/bjlang), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k).

## Contributions and Support

Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/chipseq/tree/dev" target="_blank">nf-core/chipseq</a>
This report has been generated by the <a href="https://github.com/nf-core/chipseq/releases/tag/2.1.0" target="_blank">nf-core/chipseq</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/chipseq/dev/docs/output" target="_blank">documentation</a>.
<a href="https://nf-co.re/chipseq/2.1.0/docs/output" target="_blank">documentation</a>.

data_format: "yaml"

Expand Down
23 changes: 21 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,8 @@ process {
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library/samtools_stats" },
mode: params.publish_dir_mode,
pattern: '*.{stats,flagstat,idxstats}'
pattern: '*.{stats,flagstat,idxstats}',
enabled: params.save_align_intermeds
]
}

Expand Down Expand Up @@ -415,6 +416,24 @@ process {
]
}

withName: '.*:BAM_FILTER_BAMTOOLS:BAM_SORT_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.mLb.clN.sorted" }
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library" },
mode: params.publish_dir_mode,
pattern: "*.{bam,bai}"
]
}

withName: '.*:BAM_FILTER_BAMTOOLS:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.mLb.clN.sorted.bam" }
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library/samtools_stats" },
mode: params.publish_dir_mode,
pattern: "*.{stats,flagstat,idxstats}"
]
}

withName: 'PHANTOMPEAKQUALTOOLS' {
ext.args = { "--max-ppsize=500000" }
ext.args2 = { "-p=$task.cpus" }
Expand Down Expand Up @@ -553,7 +572,7 @@ process {
params.save_macs_pileup ? '--bdg --SPMR' : '',
params.macs_fdr ? "--qvalue ${params.macs_fdr}" : '',
params.macs_pvalue ? "--pvalue ${params.macs_pvalue}" : '',
params.aligner == "chromap" ? "--format BAM" : '' //TODO check if not needed anymore with new chromap versions
params.aligner == "chromap" ? "--format BAM" : ''
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library/macs3/${params.narrow_peak ? 'narrow_peak' : 'broad_peak'}" },
Expand Down
10 changes: 4 additions & 6 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The directories listed below will be created in the output directory after the p

## Pipeline overview

The pipeline is built using [Nextflow](https://www.nextflow.io/). See [`main README.md`](../README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.
The pipeline is built using [Nextflow](https://www.nextflow.io/). See [`introduction`](../..) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.

See [Illumina website](https://emea.illumina.com/techniques/sequencing/dna-sequencing/chip-seq.html) for more information regarding the ChIP-seq protocol, and for an extensive list of publications.

Expand Down Expand Up @@ -50,7 +50,7 @@ The initial QC and alignments are performed at the library-level e.g. if the sam

</details>

[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`usage.md`](usage.md) for more details about the trimming options.
[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`parameters`](../parameters/#adapter-trimming-options) for more details about the trimming options.

![MultiQC - Cutadapt trimmed sequence plot](images/mqc_cutadapt_plot.png)

Expand All @@ -70,12 +70,10 @@ The pipeline has been written in a way where all the files generated downstream

</details>

Adapter-trimmed reads are mapped to the reference assembly using the aligner set by the `--aligner` parameter. Available aligners are [BWA](http://bio-bwa.sourceforge.net/bwa.shtml) (default), [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [Chromap](https://github.com/haowenz/chromap) and [STAR](https://github.com/alexdobin/STAR). A genome index is required to run any of this aligners so if this is not provided explicitly using the corresponding parameter (e.g. `--bwa_index`), then it will be created automatically from the genome fasta input. The index creation process can take a while for larger genomes so it is possible to use the `--save_reference` parameter to save the indices for future pipeline runs, reducing processing times.
Adapter-trimmed reads are mapped to the reference assembly using the aligner set by the `--aligner` parameter. Available aligners are [BWA](http://bio-bwa.sourceforge.net/bwa.shtml) (default), [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [Chromap](https://github.com/haowenz/chromap) and [STAR](https://github.com/alexdobin/STAR). A genome index is required to run any of these aligners so if this is not provided explicitly using the corresponding parameter (e.g. `--bwa_index`), then it will be created automatically from the genome fasta input. The index creation process can be time-consuming for large genomes, so you can use the `--save_reference` parameter to save the indices for future pipeline runs, thereby reducing processing times.

![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png)

> **NB:** Currently, paired-end files produced by `Chromap` are excluded from downstream analysis due to [this](https://github.com/nf-core/chipseq/issues/291) issue. Single-end files are processed normally.

#### Unmapped reads

The `--save_unaligned` parameter enables to obtain FastQ files containing unmapped reads (only available for STAR and Bowtie2).
Expand Down Expand Up @@ -202,7 +200,7 @@ The results from deepTools plotProfile gives you a quick visualisation for the g

[MACS3](https://github.com/macs3-project/MACS) is one of the most popular peak-calling algorithms for ChIP-seq data. By default, the peaks are called with the MACS3 `--broad` parameter. If, however, you would like to call narrow peaks then please provide the `--narrow_peak` parameter when running the pipeline. See [MACS3 outputs](https://github.com/macs3-project/MACS/blob/master/docs/callpeak.md#output-files) for a description of the output files generated by MACS3.

![MultiQC - MACS3 total peak count plot](images/mqc_macs2_peak_count_plot.png)
![MultiQC - MACS3 total peak count plot](images/mqc_macs3_peak_count_plot.png)

[HOMER annotatePeaks.pl](http://homer.ucsd.edu/homer/ngs/annotation.html) is used to annotate the peaks relative to known genomic features. HOMER is able to use the `--gtf` annotation file which is provided to the pipeline. Please note that some of the output columns will be blank because the annotation is not provided using HOMER's in-built database format. However, the more important fields required for downstream analysis will be populated i.e. _Annotation_, _Distance to TSS_ and _Nearest Promoter ID_.

Expand Down
6 changes: 3 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,13 +87,13 @@ NAIVE_INPUT,BLA203A49_S1_L006_R1_001.fastq.gz,,3,,,
| `control` | Sample name for control sample. |
| `control_replicate` | Integer representing replicate number for control sample. |

Example design files have bee_n provided with the pipeline for [paired-end](../assets/samplesheet_pe.csv) and [single-end](../assets/samplesheet_se.csv) data.
Example design files have been provided with the pipeline for [paired-end](../assets/samplesheet_pe.csv) and [single-end](../assets/samplesheet_se.csv) data.

> **NB:** The `group` and `replicate` columns were replaced with a single `sample` column as of v2.0 of the pipeline. The `sample` column is essentially a concatenation of the `group` and `replicate` columns. If all values of `sample` have the same number of underscores, fields defined by these underscore-separated names may be used in the PCA plots produced by the pipeline, to regain the ability to represent different groupings.

## Reference genome files

The minimum reference genome requirements are a FASTA and GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. those unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--bwa_index '/path/to/bwa/index/'`) or via a custom config file.
The minimum reference genome requirements are a FASTA and a GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. those unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--bwa_index '/path/to/bwa/index/'`) or via a [custom config file](https://nf-co.re/usage/configuration#custom-configuration-files).

- If `--genome` is provided then the FASTA and GTF files (and existing indices) will be automatically obtained from AWS-iGenomes unless these have already been downloaded locally in the path specified by `--igenomes_base`.
- If `--gene_bed` is not provided then it will be generated from the GTF file.
Expand Down Expand Up @@ -126,7 +126,7 @@ cd v3.0
wget -L https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz && gunzip ENCFF356LFX.bed.gz && mv ENCFF356LFX.bed hg38-blacklist.v3.bed
```

> **NB:** A detailed description of the different versions of the files can be found [here](https://sites.google.com/site/anshulkundaje/projects/blacklists). Also, to to see which blacklist bed files are assigned by default to the respective reference genome check the [igenomes.config](https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config).
> **NB:** A detailed description of the different versions of the files can be found [here](https://github.com/Boyle-Lab/Blacklist/blob/master/README.md). Also, to to see which blacklist bed files are assigned by default to the respective reference genome check the [igenomes.config](https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config).

## Running the pipeline

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@
},
"phantompeakqualtools": {
"branch": "master",
"git_sha": "2dfe9afa90fefc70e320140e5f41287f01f324b0",
"git_sha": "ec48f56f6e1571e23800aaaba41cceda13408e02",
"installed_by": ["modules"]
},
"picard/collectmultiplemetrics": {
Expand Down
2 changes: 1 addition & 1 deletion modules/local/multiqc_custom_phantompeakqualtools.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ process MULTIQC_CUSTOM_PHANTOMPEAKQUALTOOLS {
tag "$meta.id"
conda "conda-forge::r-base=4.3.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'oras://community.wave.seqera.io/library/r-base:4.3.3--452dec8277637366':
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/45/4569ff9993578b8402d00230ab9dd75ce6e63529731eb24f21579845e6bd5cdb/data':
'community.wave.seqera.io/library/r-base:4.3.3--14bb33ac537aea22' }"

input:
Expand Down
2 changes: 0 additions & 2 deletions modules/nf-core/phantompeakqualtools/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion modules/nf-core/phantompeakqualtools/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading