Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sorting in overall_summary.tsv #750

Merged
merged 3 commits into from
Jun 18, 2024

Conversation

d4straub
Copy link
Collaborator

@d4straub d4straub commented Jun 17, 2024

overall_summary.tsv had sometimes misleading numbers in 2.9.0. This was due to a new sorting method added in #717 (necessary due to bad forward & reverse read pairing due to edge case sample names).

This PR makes sure that all tables that are merged are identically sorted (merged by cbind rather than merge due to different row names (contain sampleID), e.g. sample1.trimmed_1.trim.fastq.gz & sample1_1.filt.fastq.gz & sample1_2.filt.fastq.gz).
I also considered correcting the row names for each table and subsequently apply merge, but because row names are so divers, that seems not great. I do have the feeling correcting row names and use merge might be safer, but I couldnt find any example where it would matter, but I am open to change the implementation.

Using the example above, the sorting should be fine:

> sort( c("sample1.trimmed_1.trim.fastq.gz","sample2.trimmed_1.trim.fastq.gz","sample10.trimmed_1.trim.fastq.gz","sample10_1.trimmed_1.trim.fastq.gz") )
[1] "sample1.trimmed_1.trim.fastq.gz"    "sample10_1.trimmed_1.trim.fastq.gz"
[3] "sample10.trimmed_1.trim.fastq.gz"   "sample2.trimmed_1.trim.fastq.gz"   
> sort( c("sample1_1.filt.fastq.gz","sample2_1.filt.fastq.gz","sample10_1.filt.fastq.gz","sample10_1_1.filt.fastq.gz") )
[1] "sample1_1.filt.fastq.gz"    "sample10_1_1.filt.fastq.gz"
[3] "sample10_1.filt.fastq.gz"   "sample2_1.filt.fastq.gz"

Closes #742.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Jun 17, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 6f520a9

+| ✅ 281 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-06-17 13:33:35

Copy link
Member

@erikrikarddaniel erikrikarddaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

d4straub and others added 2 commits June 17, 2024 15:26
Co-authored-by: Daniel Lundin <erik.rikard.daniel@gmail.com>
@d4straub d4straub merged commit 2c464fd into nf-core:dev Jun 18, 2024
17 checks passed
@d4straub d4straub deleted the fix-overall_summary.tsv-sorting branch June 18, 2024 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants