Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generating mag samplesheet #544

Open
wants to merge 27 commits into
base: dev
Choose a base branch
from

Conversation

sofstam
Copy link
Collaborator

@sofstam sofstam commented Oct 10, 2024

Closes #542

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/taxprofiler branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@sofstam sofstam linked an issue Oct 10, 2024 that may be closed by this pull request
nextflow.config Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
}
.tap { ch_colnames }

channelToSamplesheet(ch_colnames, ch_list_for_samplesheet, 'downstream_samplesheets', format, format_sep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joon has updated the function to also include the pipeline name as an additional argument, please check detaxier for the example (this will be helpful when @LilyAnderssonLee 's differentialabundance PR comes in :)

Comment on lines 21 to 22
def short_reads_1 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_1.getName()
def short_reads_2 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_2.getName()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not also make this support single_end using the meta.single_end information?

Or have we lost that information at this stage?

Suggested change
def short_reads_1 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_1.getName()
def short_reads_2 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_2.getName()
def short_reads_1 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_1.getName()
def short_reads_2 = !meta.single_end ? file(params.outdir).toString() + '/' + meta.id + '/' + fastq_2.getName() : ""

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not follow here. We said to filter out the single ends and fasta. Or do I misunderstand something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, good point I forgot and there wasn't a comment why 😅

Please leave a comment next to the line saying only PE supported now, and I'll look into fixing that later :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sofstam did you see what @Joon-Klaps did in detaxizer for SE/PE reads? You can copy the same concept here too (basically to branch SE/PE separately, and call the channelToSamplesheet function twice ;)

workflows/taxprofiler.nf Outdated Show resolved Hide resolved
@jfy133
Copy link
Member

jfy133 commented Oct 15, 2024

@nf-core-bot fix linting

Copy link

github-actions bot commented Oct 15, 2024

nf-core pipelines lint overall result: Failed ❌

Posted for pipeline commit 6eeb982

+| ✅ 260 tests passed       |+
!| ❗   4 tests had warnings |!
-| ❌  10 tests failed       |-

❌ Test failures:

  • nextflow_config - Config variable (incorrectly) found: params.max_cpus
  • nextflow_config - Config variable (incorrectly) found: params.max_memory
  • nextflow_config - Config variable (incorrectly) found: params.max_time
  • nextflow_config - Old lines for loading custom profiles found. File should contain: ```groovy
    // Load nf-core custom profiles from different Institutions
    includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null"
  • files_unchanged - .github/CONTRIBUTING.md does not match the template
  • files_unchanged - .github/PULL_REQUEST_TEMPLATE.md does not match the template
  • files_unchanged - .github/workflows/linting_comment.yml does not match the template
  • files_unchanged - .github/workflows/linting.yml does not match the template
  • files_unchanged - .gitignore does not match the template
  • actions_awsfulltest - .github/workflows/awsfulltest.yml is not triggered correctly

❗ Test warnings:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-10-24 10:44:44

docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated
<summary>Output files</summary>

- `downstream_samplesheets/`
- `mag.csv`: input sheet for nf-core/mag with paths to nf-core/taxprofiler preprocessed (corresponding to what is saved with `--save_analysis_ready_fastqs`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing differential abundance? @LilyAnderssonLee do you thin kyou could merge your PR into this one at this stage?

Comment on lines 21 to 22
def short_reads_1 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_1.getName()
def short_reads_2 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_2.getName()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sofstam did you see what @Joon-Klaps did in detaxizer for SE/PE reads? You can copy the same concept here too (basically to branch SE/PE separately, and call the channelToSamplesheet function twice ;)

sofstam and others added 6 commits October 16, 2024 13:48
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
@jfy133
Copy link
Member

jfy133 commented Oct 22, 2024

@nf-core-bot fix linting

docs/output.md Outdated
@@ -758,7 +758,9 @@ The pipeline can also generate input files for the following downstream pipeline
<summary>Output files</summary>

- `downstream_samplesheets/`
- `mag.csv`: input sheet for that contains paths to preprocessed FASTQs (corresponding to what is saved with `--save_analysis_ready_fastqs`) that can be used to skip read preprocessing steps in nf-core/mag
- `mag-{pe,se}.csv`: input sheet for single-end and paired-end reads that contains paths to preprocessed short-read FASTQs (corresponding to what is saved with `--save_analysis_ready_fastqs`) that can be used to skip read preprocessing steps in nf-core/mag.
- Note: if you merge reads, these will be listed in teh `mag-se.csv`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot make a suggestion but:

teh mag-se.csv. -> the mag-se.csv.

pe: true
}
pe: it.short_reads_2 != ""
unknown: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the unknown: true here?

@jfy133
Copy link
Member

jfy133 commented Oct 24, 2024

I think we are almost there, for some reason we get extra files in analysis_ready_reads folder, but the samplesheets get generated and point to the right files, except the mag-se.csv when we don't merge pairs, whichappears to be msising the file name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a samplesheet for mag
3 participants