Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline consistency & Unit testing in nf-core #209

Closed
apeltzer opened this issue Nov 23, 2018 · 5 comments
Closed

Pipeline consistency & Unit testing in nf-core #209

apeltzer opened this issue Nov 23, 2018 · 5 comments

Comments

@apeltzer
Copy link
Member

apeltzer commented Nov 23, 2018

We had quite some discussions involving @zichner @drpatelh and a bunch of others from whom I don't have Github handles at the moment.

Current setup:

  • Linting of NXF code
  • Pipeline tests on small test data with some params.

Future perspectives

There were some ideas on how we could test and develop pipelines a bit better in the future to meet certain requirements, such as:

  • new pipeline releases don't break things we already had at some point
  • provide feedback for performance tuning (e.g. having more sensible defaults for individual processes in the base.config for each pipeline)
  • ... ?

"Real-scale" Testing

This part would be probably easiest, as it "only" requires storing larger datasets somewhere (S3/GC/HTTP/GIT-LFS/...) and running these datasets through the pipeline. Will probably include:

  • Figuring out whether we can still do this with Travis CI or need longer runtimes/more memory etc
  • Ideas involved: CircleCI (@tdelhomme - whats your experiences here?)
  • Other ideas: Setup a separate server that does "real-scale" testing specifically for releases

Consistency Testing

  • Right before a stable release, we could test pipeline consistency, e.g.
    • Compare JSON output of MultiQC on V1.0 with V1.1dev to make sure updated tools don't change important metrics substantially
    • For this, an idea was to have three types of change-checks:
      • "Strict" = Needs to be 1:1 identical, (e.g. I expect all samples to run through / 105 called variants)
      • "Lenient" = Needs to be in a pre-defined range (e.g. I expect 95-100% mapped reads here)
      • "SILENT" = I don't expect this to be the same (e.g. reads mapped per second, which might differ but isn't important at all for consistency reasons)

Unit Testing on process-level

There was a talk at the #nfhack18 to do testing in a "Unit-Test" style way using the storeDir directive. There was also another group discussing options here: nextflow-io/nf-hack18#7 to implement possibilities in NXF directly, which we should probably join and/or contribute to. The idea is to test individual processes, similar as to a function in typical software development unit tests - currently this can just be mocked using the storeDirdirective and faking existing results to force Nextflow to continue running from a specific part of the pipeline.

Discussion

  • Whether to test for identical output or just defining thresholds / acceptable ranges
  • Maybe having different flags for testing (strict/lenient/silent)

The biggest issue will be to find an efficient way of running tests for consistency detection.

Idea discussed:

  • Compare YAML/JSON output (used by MultiQC for most of the pipelines anyways), define which entries are required/not necessary etc using a separate TSV or similar and then use that information to compare between pipeline releases

This is open for discussion of course, please contribute!

@sven1103
Copy link
Member

Dear all,

my contribution to the discussion:

On release / right before a release, we should run pipeline consistency tests, e.g. compare

  • Results/ Metrics between the previous and the new release
  • How do we achieve this? Ideas were to parse multiqc.json files and define thresholds for testing

I found it rather hard to have a metric / threshold that tells us: hey, this new version of the pipeline is inconsistent with the former one, don't bring it to production. I.e. for variant calling, if we switch to a complete different VC version, or a different tool, of course the set of predicted variants will be different.

So I would like to have an example for a threshold for consistency determination that is plausible :)

Switch to have a system for releases that allows running "realistic data"
Maybe investigate CircleCI for this before thinking about self-hosting options?

Yes, this makes a lot of sense. Where would we host the bigger test-data sets? We could have a git LFS-based storage, so we don't lose the versioning control advantage. However, I this is not for free on Github https://help.github.com/articles/about-storage-and-bandwidth-usage. I mean we can set this up easily with a Gitlab server and git LFS as FS. Or have storage buckets on deNBI / ELEXIR, if we dont want to go to commercial cloud storage solutions?

Unit Testing Consider using the storeDir directive to also enable Unit testing in pipelines, e.g. testing individual steps instead of running the entire pipeline everytime

Could you give an example how this would work in a process test case scenario?

Maybe having different flags for testing (strict/lenient/silent)

Use-case, advantage?

Hit me now :) Best, Sven

@apeltzer
Copy link
Member Author

I found it rather hard to have a metric / threshold that tells us: hey, this new version of the pipeline is inconsistent with the former one, don't bring it to production. I.e. for variant calling, if we switch to a complete different VC version, or a different tool, of course the set of predicted variants will be different.

It might be quite important to know whether something broke your pipeline. Even if its just informative "hey, we just have found 10% more variants because method X was updated!" this is really important! ATM, most people don't collect that sort of information, but it would be nice to actually create it!

Switch to have a system for releases that allows running "realistic data"
Maybe investigate CircleCI for this before thinking about self-hosting options?

Yes, this makes a lot of sense. Where would we host the bigger test-data sets? We could have a git LFS-based storage, so we don't lose the versioning control advantage. However, I this is not for free on Github https://help.github.com/articles/about-storage-and-bandwidth-usage. I mean we can set this up easily with a Gitlab server and git LFS as FS. Or have storage buckets on deNBI / ELEXIR, if we dont want to go to commercial cloud storage solutions?

Agreed, I updated the first post accordingly.

Unit Testing Consider using the storeDir directive to also enable Unit testing in pipelines, e.g. testing individual steps instead of running the entire pipeline everytime

Could you give an example how this would work in a process test case scenario?

One would test individual processes: input A -> Input B, if this doesn't hold, raise an error. More like a functional testing approach, then running entire pipelines, thus ideally faster as well.

Maybe having different flags for testing (strict/lenient/silent)

If I don't care about a certain metric to change, I can silence this. If I care about the metric being in a comparable range, use "lenient" and define the range. If I care about a certain metric to be 1:1 the same, use "strict" checking.

Overall, we had the idea to have more realistic tests but have optional consistency tests in the end.

@tdelhomme
Copy link
Member

tdelhomme commented Nov 23, 2018

Dear all,

Concerning our way of testing, we are storing a small NGS data set here
and we have configured CircleCI to run tests described e.g. here at each commit. Once the tests are passed, circleCI will run our deployment script. As discussed this afternoon with Phil, then tests are ok we build the Docker container on dockerhub like that, and we have also a tricky way to build the singularity container (not possible by triggering), see discussion here and code here, basically we force singularity to build on commit by adding a line in the singularity file when tests are ok.

Tell me what do you think about this!

@ewels
Copy link
Member

ewels commented Feb 6, 2020

I think unit testing will now be implemented as part of https://github.com/nf-core/modules for each tool. Additionally, we have now been granted money to start doing full-size workflow testing on AWS with real data.

Together, these two things should address basically everything suggested here I think. So I will close the issue. Feel free to reopen if I missed something.

@apeltzer
Copy link
Member Author

You didn#t close it though ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants