Pipeline consistency & Unit testing in nf-core #209

apeltzer · 2018-11-23T12:10:47Z

We had quite some discussions involving @zichner @drpatelh and a bunch of others from whom I don't have Github handles at the moment.

Current setup:

Linting of NXF code
Pipeline tests on small test data with some params.

Future perspectives

There were some ideas on how we could test and develop pipelines a bit better in the future to meet certain requirements, such as:

new pipeline releases don't break things we already had at some point
provide feedback for performance tuning (e.g. having more sensible defaults for individual processes in the base.config for each pipeline)
... ?

"Real-scale" Testing

This part would be probably easiest, as it "only" requires storing larger datasets somewhere (S3/GC/HTTP/GIT-LFS/...) and running these datasets through the pipeline. Will probably include:

Figuring out whether we can still do this with Travis CI or need longer runtimes/more memory etc
Ideas involved: CircleCI (@tdelhomme - whats your experiences here?)
Other ideas: Setup a separate server that does "real-scale" testing specifically for releases

Consistency Testing

Right before a stable release, we could test pipeline consistency, e.g.
- Compare JSON output of MultiQC on V1.0 with V1.1dev to make sure updated tools don't change important metrics substantially
- For this, an idea was to have three types of change-checks:
  - "Strict" = Needs to be 1:1 identical, (e.g. I expect all samples to run through / 105 called variants)
  - "Lenient" = Needs to be in a pre-defined range (e.g. I expect 95-100% mapped reads here)
  - "SILENT" = I don't expect this to be the same (e.g. reads mapped per second, which might differ but isn't important at all for consistency reasons)

Unit Testing on process-level

There was a talk at the #nfhack18 to do testing in a "Unit-Test" style way using the storeDir directive. There was also another group discussing options here: nextflow-io/nf-hack18#7 to implement possibilities in NXF directly, which we should probably join and/or contribute to. The idea is to test individual processes, similar as to a function in typical software development unit tests - currently this can just be mocked using the storeDirdirective and faking existing results to force Nextflow to continue running from a specific part of the pipeline.

Discussion

Whether to test for identical output or just defining thresholds / acceptable ranges
Maybe having different flags for testing (strict/lenient/silent)

The biggest issue will be to find an efficient way of running tests for consistency detection.

Idea discussed:

Compare YAML/JSON output (used by MultiQC for most of the pipelines anyways), define which entries are required/not necessary etc using a separate TSV or similar and then use that information to compare between pipeline releases

This is open for discussion of course, please contribute!

The text was updated successfully, but these errors were encountered:

sven1103 · 2018-11-23T12:51:26Z

Dear all,

my contribution to the discussion:

On release / right before a release, we should run pipeline consistency tests, e.g. compare

Results/ Metrics between the previous and the new release

How do we achieve this? Ideas were to parse multiqc.json files and define thresholds for testing

I found it rather hard to have a metric / threshold that tells us: hey, this new version of the pipeline is inconsistent with the former one, don't bring it to production. I.e. for variant calling, if we switch to a complete different VC version, or a different tool, of course the set of predicted variants will be different.

So I would like to have an example for a threshold for consistency determination that is plausible :)

Switch to have a system for releases that allows running "realistic data"
Maybe investigate CircleCI for this before thinking about self-hosting options?

Yes, this makes a lot of sense. Where would we host the bigger test-data sets? We could have a git LFS-based storage, so we don't lose the versioning control advantage. However, I this is not for free on Github https://help.github.com/articles/about-storage-and-bandwidth-usage. I mean we can set this up easily with a Gitlab server and git LFS as FS. Or have storage buckets on deNBI / ELEXIR, if we dont want to go to commercial cloud storage solutions?

Unit Testing Consider using the storeDir directive to also enable Unit testing in pipelines, e.g. testing individual steps instead of running the entire pipeline everytime

Could you give an example how this would work in a process test case scenario?

Maybe having different flags for testing (strict/lenient/silent)

Use-case, advantage?

Hit me now :) Best, Sven

apeltzer · 2018-11-23T13:38:17Z

I found it rather hard to have a metric / threshold that tells us: hey, this new version of the pipeline is inconsistent with the former one, don't bring it to production. I.e. for variant calling, if we switch to a complete different VC version, or a different tool, of course the set of predicted variants will be different.

It might be quite important to know whether something broke your pipeline. Even if its just informative "hey, we just have found 10% more variants because method X was updated!" this is really important! ATM, most people don't collect that sort of information, but it would be nice to actually create it!

Switch to have a system for releases that allows running "realistic data"
Maybe investigate CircleCI for this before thinking about self-hosting options?

Yes, this makes a lot of sense. Where would we host the bigger test-data sets? We could have a git LFS-based storage, so we don't lose the versioning control advantage. However, I this is not for free on Github https://help.github.com/articles/about-storage-and-bandwidth-usage. I mean we can set this up easily with a Gitlab server and git LFS as FS. Or have storage buckets on deNBI / ELEXIR, if we dont want to go to commercial cloud storage solutions?

Agreed, I updated the first post accordingly.

Unit Testing Consider using the storeDir directive to also enable Unit testing in pipelines, e.g. testing individual steps instead of running the entire pipeline everytime

Could you give an example how this would work in a process test case scenario?

One would test individual processes: input A -> Input B, if this doesn't hold, raise an error. More like a functional testing approach, then running entire pipelines, thus ideally faster as well.

Maybe having different flags for testing (strict/lenient/silent)

If I don't care about a certain metric to change, I can silence this. If I care about the metric being in a comparable range, use "lenient" and define the range. If I care about a certain metric to be 1:1 the same, use "strict" checking.

Overall, we had the idea to have more realistic tests but have optional consistency tests in the end.

tdelhomme · 2018-11-23T14:54:19Z

Dear all,

Concerning our way of testing, we are storing a small NGS data set here
and we have configured CircleCI to run tests described e.g. here at each commit. Once the tests are passed, circleCI will run our deployment script. As discussed this afternoon with Phil, then tests are ok we build the Docker container on dockerhub like that, and we have also a tricky way to build the singularity container (not possible by triggering), see discussion here and code here, basically we force singularity to build on commit by adding a line in the singularity file when tests are ok.

Tell me what do you think about this!

ewels · 2020-02-06T16:05:49Z

I think unit testing will now be implemented as part of https://github.com/nf-core/modules for each tool. Additionally, we have now been granted money to start doing full-size workflow testing on AWS with real data.

Together, these two things should address basically everything suggested here I think. So I will close the issue. Feel free to reopen if I missed something.

apeltzer · 2020-02-29T16:06:43Z

You didn#t close it though ;-)

ewels mentioned this issue Nov 23, 2018

[HACK TOPIC] Process unit testing nextflow-io/nf-hack18#7

Open

ewels added automation pipeline-testing labels Jan 8, 2019

ewels mentioned this issue Jan 8, 2019

Implement some kind of validation tests #94

Closed

apeltzer closed this as completed Feb 29, 2020

olgabot mentioned this issue Apr 27, 2020

Testing of pipeline output #605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline consistency & Unit testing in nf-core #209

Pipeline consistency & Unit testing in nf-core #209

apeltzer commented Nov 23, 2018 •

edited

Loading

sven1103 commented Nov 23, 2018

apeltzer commented Nov 23, 2018

tdelhomme commented Nov 23, 2018 •

edited

Loading

ewels commented Feb 6, 2020

apeltzer commented Feb 29, 2020

Pipeline consistency & Unit testing in nf-core #209

Pipeline consistency & Unit testing in nf-core #209

Comments

apeltzer commented Nov 23, 2018 • edited Loading

Current setup:

Future perspectives

"Real-scale" Testing

Consistency Testing

Unit Testing on process-level

Discussion

sven1103 commented Nov 23, 2018

apeltzer commented Nov 23, 2018

tdelhomme commented Nov 23, 2018 • edited Loading

ewels commented Feb 6, 2020

apeltzer commented Feb 29, 2020

apeltzer commented Nov 23, 2018 •

edited

Loading

tdelhomme commented Nov 23, 2018 •

edited

Loading