Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fastp length_required #1660

Merged
merged 6 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- [1640](https://github.com/nf-core/sarek/pull/1620) - Add `lofreq` as a tumor-only variant caller.
- [1642](https://github.com/nf-core/sarek/pull/1642) - Back to dev
- [1660](https://github.com/nf-core/sarek/pull/1642) - Add `--length_required` for minimal reads length with `FASTP`

### Changed

Expand Down
3 changes: 2 additions & 1 deletion conf/modules/trimming.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ process {
params.three_prime_clip_r1 > 0 ? "--trim_tail1 ${params.three_prime_clip_r1}" : '', // Remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed
params.three_prime_clip_r2 > 0 ? "--trim_tail2 ${params.three_prime_clip_r2}" : '', // Remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed
params.trim_nextseq ? '--trim_poly_g' : '', // Apply the --nextseq=X option, to trim based on quality after removing poly-G tails
params.split_fastq > 0 ? "--split_by_lines ${params.split_fastq * 4}" : ''
params.split_fastq > 0 ? "--split_by_lines ${params.split_fastq * 4}" : '', // Output by limiting lines of each file with this option
params.length_required > 0 ? "--length_required ${params.length_required}": '', // Reads shorter will be discarded
].join(' ').trim()
publishDir = [
[
Expand Down
1 change: 1 addition & 0 deletions conf/test/trimming.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ params {
clip_r2 = 1
three_prime_clip_r1 = 1
three_prime_clip_r2 = 1
length_required = 50
tools = null
trim_fastq = true
}
7 changes: 6 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Directory Structure](#directory-structure)
- [Preprocessing](#preprocessing)
- [Preparation of input files (FastQ or (u)BAM)](#preparation-of-input-files-fastq-or-ubam)
- [Clip and filter read length](#clip-and-filter-read-length)
- [Trim adapters](#trim-adapters)
- [Split FastQ files](#split-fastq-files)
- [UMI consensus](#umi-consensus)
Expand Down Expand Up @@ -41,8 +42,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Sentieon DNAscope joint germline variant calling](#sentieon-dnascope-joint-germline-variant-calling)
- [Sentieon Haplotyper](#sentieon-haplotyper)
- [Sentieon Haplotyper joint germline variant calling](#sentieon-haplotyper-joint-germline-variant-calling)
- [Lofreq](#lofreq)
- [Strelka](#strelka)
- [Lofreq](#lofreq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changing the order of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I didn't changed this line, this have been done automatically.
But it does reflect that the Strelka paragraph is before Lofreq.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a markdown linting thing

- [Structural Variants](#structural-variants)
- [Manta](#manta)
- [TIDDIT](#tiddit)
Expand Down Expand Up @@ -107,6 +108,10 @@ Sarek pre-processes raw FastQ files or unmapped BAM files, based on [GATK best p

[FastP](https://github.com/OpenGene/fastp) is a tool designed to provide all-in-one preprocessing for FastQ files and as such is used for trimming and splitting. By default, these files are not published. However, if publishing is enabled, please be aware that these files are only published once, meaning if trimming and splitting is enabled, then the resulting files will be sharded FastQ files with trimmed reads. If only one of them is enabled then the files contain either trimmed or split reads, respectively.

#### Clip and filter read length

[FastP](https://github.com/OpenGene/fastp) enables efficient clipping of reads from either the 5' end (`--clip_r1`, `--clip_r2`) or the 3' end (`--three_prime_clip_r1`, `--three_prime_clip_r2`). Additionally, FastP allows the filtering of reads based on insert size by specifying a minimum required length with the `--length_required` parameter (default: 15bp). It is recommended to optimize these parameters according to the specific characteristics of your data.

#### Trim adapters

[FastP](https://github.com/OpenGene/fastp) supports global trimming, which means it trims all reads in the front or the tail. This function is useful since sometimes you want to drop some cycles of a sequencing run. In the current implementation in Sarek
Expand Down
1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ params {
three_prime_clip_r1 = 0
three_prime_clip_r2 = 0
trim_nextseq = 0
length_required = 15 // Default in FastP
save_trimmed = false
save_split_fastqs = false

Expand Down
8 changes: 8 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,14 @@
"help_text": "DetectS polyG in read tails and trim them. Corresponds to the FastP flag `--trim_poly_g`.",
"hidden": true
},
"length_required": {
"type": "integer",
"default": 15,
"fa_icon": "fas fa-cut",
"description": "Minimum length of reads to keep",
"help_text": "This is the minimum length of reads to keep after trimming. Corresponds to the FastP flag `--length_required` (default in FastP is 15bp).",
"hidden": true
},
"save_trimmed": {
"type": "boolean",
"fa_icon": "fas fa-save",
Expand Down
2 changes: 1 addition & 1 deletion tests/test_fastp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
- path: results/reports/fastp/test
- path: results/reports/fastqc/test-test_L1
- path: results/reports/markduplicates/test/test.md.cram.metrics
contains: ["test 16608 1860 160 1046616 12117 256 0 0.621261"]
contains: ["test 16608 1860 164 1046488 12097 254 0 0.620081 6174"]
- path: results/reports/mosdepth/test/test.md.mosdepth.global.dist.txt
- path: results/reports/mosdepth/test/test.md.mosdepth.region.dist.txt
- path: results/reports/mosdepth/test/test.md.mosdepth.summary.txt
Expand Down