-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing of pipeline output #605
Comments
I've had a very brief look around and can't find much existing stuff for testing output files (please post suggestions if you know any!). So I'm getting increasingly keen on writing something to do this. To expand on my slack message above, I could see this working as follows: New
|
I agree that something which integrates as cleanly as possible with the current Nextflow infrastructure would probably be best |
It seemed to me that one of the challenges was just in writing the code the check the output of each file type. That's why in my example I did a wrapper around samtools flagstat then stopped there, because the time investment grew rather large to wrap each tool's files in a class just to more easily access it's output. In that regard MultiQC could be useful. |
for reference, the method I hacked together with Python https://github.com/stevekm/nextflow-ci/blob/master/test_nextflow.py another reference, I have been working on basic methods to do similar unit testing of CWL workflows here; https://github.com/mskcc/pluto-cwl/blob/master/tests/test_generate_cBioPortal_file_cwl.py This seems pretty similar to the issue of unit testing Nextflow outputs, if Python were chosen for it. Now that the new DSL2 + module system is coming out soon, maybe it would be good to wait until the formal release of those? The Nextflow modules especially seem like something that would help a lot to streamline unit testing for Nextflow. In my CWL examples there, I am taking advantage of the default @pditommaso @evanfloden maybe this is a feature we could replicate in Nextflow in order to aid testing like this? Having a method to get a description of the task/workflow outputs like this would help a lot I think. Not sure if this is accessible via Groovy either? |
The nextflow A few pipelines (eg. sarek) already use this for debugging / introspection. It's not exactly standardised though. |
@emiller88 is working on this, currently in the rnaseq pipeline - nf-core/rnaseq#546 |
@ewels This is lovely for context! @maxulysse has also got started on sarek. nf-core/sarek#370 I think my plans for it as of now are to flesh this out on rnaseq in this order
I'd like to do those in 3 PRs to keep the scope small. Then possibly in the future add tests for The functionality that would be nice here in tools is just a I'd love for any feedback! |
I'm following along. I'd love to see an example of how to incorporate unit-testing and/or regression testing into my nextflow DSL2 projects. I'm looking for guidance. I'm happy to follow along with your recommendations; and especially happy to follow along by looking at some example projects. Where's the best place for me to get started? As an fyi, I built a similar testing framework for the GenePattern Server platform. Called GpUnit. Good ideas:
Not-so-good ideas:
Challenges:
|
I think we should add this to the pipeline template. |
I think the new |
Now being handled with nf-test. |
(Moving discussion from slack)
tl;dr: Currently, nf-core pipelines merely check that all the processes run without errors, but do not ensure that outputs exactly match some expected, like unit testing. This thread is to address the question: How can we change that?
Related:
Full discussion:
@stevekm:
@drpatelh :
@ewels
@stevekm:
... a few days later ...
@olgabot
@ewels
@olgabot
@ewels
From @ewels
from @olgabot:
The text was updated successfully, but these errors were encountered: