Fix sorting in overall_summary.tsv #750

d4straub · 2024-06-17T12:32:35Z

overall_summary.tsv had sometimes misleading numbers in 2.9.0. This was due to a new sorting method added in #717 (necessary due to bad forward & reverse read pairing due to edge case sample names).

This PR makes sure that all tables that are merged are identically sorted (merged by cbind rather than merge due to different row names (contain sampleID), e.g. sample1.trimmed_1.trim.fastq.gz & sample1_1.filt.fastq.gz & sample1_2.filt.fastq.gz).
I also considered correcting the row names for each table and subsequently apply merge, but because row names are so divers, that seems not great. I do have the feeling correcting row names and use merge might be safer, but I couldnt find any example where it would matter, but I am open to change the implementation.

Using the example above, the sorting should be fine:

> sort( c("sample1.trimmed_1.trim.fastq.gz","sample2.trimmed_1.trim.fastq.gz","sample10.trimmed_1.trim.fastq.gz","sample10_1.trimmed_1.trim.fastq.gz") )
[1] "sample1.trimmed_1.trim.fastq.gz"    "sample10_1.trimmed_1.trim.fastq.gz"
[3] "sample10.trimmed_1.trim.fastq.gz"   "sample2.trimmed_1.trim.fastq.gz"   
> sort( c("sample1_1.filt.fastq.gz","sample2_1.filt.fastq.gz","sample10_1.filt.fastq.gz","sample10_1_1.filt.fastq.gz") )
[1] "sample1_1.filt.fastq.gz"    "sample10_1_1.filt.fastq.gz"
[3] "sample10_1.filt.fastq.gz"   "sample2_1.filt.fastq.gz"

Closes #742.

PR checklist

github-actions · 2024-06-17T12:36:03Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 6f520a9

+| ✅ 281 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

readme - README did not have a Nextflow minimum version badge.

❔ Tests ignored:

files_exist - File is ignored: conf/igenomes.config
nextflow_config - Config default ignored: params.report_template
nextflow_config - Config default ignored: params.report_css
nextflow_config - Config default ignored: params.report_logo
files_unchanged - File ignored due to lint config: .gitattributes
actions_ci - actions_ci

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-ampliseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-ampliseq_logo_light.png
files_exist - File found: docs/images/nf-core-ampliseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-ampliseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowAmpliseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 2.10.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.extension= /*_R{1,2}_001.fastq.gz
nextflow_config - Config default value correct: params.min_read_counts= 1
nextflow_config - Config default value correct: params.cutadapt_min_overlap= 3
nextflow_config - Config default value correct: params.cutadapt_max_error_rate= 0.1
nextflow_config - Config default value correct: params.trunc_qmin= 25
nextflow_config - Config default value correct: params.trunc_rmin= 0.75
nextflow_config - Config default value correct: params.max_ee= 2
nextflow_config - Config default value correct: params.min_len= 50
nextflow_config - Config default value correct: params.sample_inference= independent
nextflow_config - Config default value correct: params.vsearch_cluster_id= 0.97
nextflow_config - Config default value correct: params.orf_start= 1
nextflow_config - Config default value correct: params.stop_codons= TAA,TAG
nextflow_config - Config default value correct: params.dada_ref_taxonomy= silva=138
nextflow_config - Config default value correct: params.pplace_alnmethod= hmmer
nextflow_config - Config default value correct: params.kraken2_confidence= 0.0
nextflow_config - Config default value correct: params.cut_its= none
nextflow_config - Config default value correct: params.its_partial= 0
nextflow_config - Config default value correct: params.exclude_taxa= mitochondria,chloroplast
nextflow_config - Config default value correct: params.min_frequency= 1
nextflow_config - Config default value correct: params.min_samples= 1
nextflow_config - Config default value correct: params.diversity_rarefaction_depth= 500
nextflow_config - Config default value correct: params.ancom_sample_min_count= 1
nextflow_config - Config default value correct: params.tax_agglom_min= 2
nextflow_config - Config default value correct: params.tax_agglom_max= 6
nextflow_config - Config default value correct: params.report_title= Summary of analysis results
nextflow_config - Config default value correct: params.seed= 100
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (345 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/tsv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - QIIME2_EXTRACT found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - RENAME_RAW_DATA_FILES found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT_BASIC found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT_READTHROUGH found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT_DOUBLEPRIMER found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT_TAXONOMY found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT_SUMMARY_MERGE found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_QUALITY1 found in conf/modules.config and Nextflow scripts.
modules_config - TRUNCLEN found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_FILTNTRIM found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_QUALITY2 found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_ERR found in conf/modules.config and Nextflow scripts.
modules_config - NOVASEQ_ERR found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_DENOISING found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_RMCHIMERA found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_STATS found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_MERGE found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_SPLITREGIONS found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_DBFILT found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_DBEXTRACT found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_TRIM found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_ALIGN found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_DBRECON found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_TABLERECON found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_TAXRECON found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_FILTTAX found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_SEQRECON found in conf/modules.config and Nextflow scripts.
modules_config - SIDLE_TREERECON found in conf/modules.config and Nextflow scripts.
modules_config - BARRNAP found in conf/modules.config and Nextflow scripts.
modules_config - BARRNAPSUMMARY found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_SSU found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_LEN_ASV found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_CODONS found in conf/modules.config and Nextflow scripts.
modules_config - MERGE_STATS_STD found in conf/modules.config and Nextflow scripts.
modules_config - ITSX_CUTASV found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_LEN_ITSX found in conf/modules.config and Nextflow scripts.
modules_config - FORMAT_TAXONOMY found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_TAXONOMY found in conf/modules.config and Nextflow scripts.
modules_config - DADA2_ADDSPECIES found in conf/modules.config and Nextflow scripts.
modules_config - FORMAT_TAXONOMY_SINTAX found in conf/modules.config and Nextflow scripts.
modules_config - VSEARCH_SINTAX found in conf/modules.config and Nextflow scripts.
modules_config - FORMAT_TAXRESULTS_SINTAX found in conf/modules.config and Nextflow scripts.
modules_config - KRAKEN2_KRAKEN2 found in conf/modules.config and Nextflow scripts.
modules_config - FORMAT_TAXRESULTS_KRAKEN2 found in conf/modules.config and Nextflow scripts.
modules_config - VSEARCH_USEARCHGLOBAL found in conf/modules.config and Nextflow scripts.
modules_config - VSEARCH_CLUSTER found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_CLUSTERS found in conf/modules.config and Nextflow scripts.
modules_config - ASSIGNSH found in conf/modules.config and Nextflow scripts.
modules_config - FORMAT_TAXONOMY_QIIME found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_EXTRACT found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_HMMBUILD found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_UNALIGNREF found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_HMMALIGNREF found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_HMMALIGNQUERY found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_MASK found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_MASKQUERY found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_MASKREF found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_AFAFORMATQUERY found in conf/modules.config and Nextflow scripts.
modules_config - HMMER_AFAFORMATREF found in conf/modules.config and Nextflow scripts.
modules_config - MAFFT found in conf/modules.config and Nextflow scripts.
modules_config - EPANG_PLACE found in conf/modules.config and Nextflow scripts.
modules_config - GAPPA_GRAFT found in conf/modules.config and Nextflow scripts.
modules_config - GAPPA_ASSIGN found in conf/modules.config and Nextflow scripts.
modules_config - GAPPA_HEATTREE found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_INASV found in conf/modules.config and Nextflow scripts.
modules_config - FORMAT_PPLACETAX found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_INASV_BPAVG found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_TABLEFILTERTAXA found in conf/modules.config and Nextflow scripts.
modules_config - FILTER_STATS found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_BARPLOT found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_BPAVG found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_EXPORT_ABSOLUTE found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_EXPORT_RELASV found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_TREE found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_FILTERSAMPLES_ANCOM found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_ALPHARAREFACTION found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_DIVERSITY_CORE found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_DIVERSITY_ADONIS found in conf/modules.config and Nextflow scripts.
modules_config - QIIME2_ANCOM_TAX found in conf/modules.config and Nextflow scripts.
modules_config - PICRUST found in conf/modules.config and Nextflow scripts.
modules_config - SBDIEXPORT found in conf/modules.config and Nextflow scripts.
modules_config - SBDIEXPORTREANNOTATE found in conf/modules.config and Nextflow scripts.
modules_config - PHYLOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - SUMMARY_REPORT found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-06-17 13:33:35

erikrikarddaniel

👍

CHANGELOG.md

Co-authored-by: Daniel Lundin <erik.rikard.daniel@gmail.com>

Fix sorting in overall_summary.tsv

3dac40a

d4straub mentioned this pull request Jun 17, 2024

overall_summary.tsv sometimes with misleading numbers in 2.9.0 #742

Closed

erikrikarddaniel approved these changes Jun 17, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

d4straub and others added 2 commits June 17, 2024 15:26

Update CHANGELOG.md

5c8382d

Co-authored-by: Daniel Lundin <erik.rikard.daniel@gmail.com>

Merge branch 'dev' into fix-overall_summary.tsv-sorting

6f520a9

d4straub merged commit 2c464fd into nf-core:dev Jun 18, 2024
17 checks passed

d4straub deleted the fix-overall_summary.tsv-sorting branch June 18, 2024 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sorting in overall_summary.tsv #750

Fix sorting in overall_summary.tsv #750

d4straub commented Jun 17, 2024 •

edited

Loading

github-actions bot commented Jun 17, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

erikrikarddaniel left a comment

Fix sorting in overall_summary.tsv #750

Fix sorting in overall_summary.tsv #750

Conversation

d4straub commented Jun 17, 2024 • edited Loading

PR checklist

github-actions bot commented Jun 17, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

erikrikarddaniel left a comment

Choose a reason for hiding this comment

d4straub commented Jun 17, 2024 •

edited

Loading

github-actions bot commented Jun 17, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️