Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests for boolean columns (and more) #386

Open
pflanze opened this issue Apr 11, 2024 · 3 comments
Open

Unit tests for boolean columns (and more) #386

pflanze opened this issue Apr 11, 2024 · 3 comments
Assignees

Comments

@pflanze
Copy link
Contributor

pflanze commented Apr 11, 2024

PR #384 does not contain any unit tests, only e2e tests.

The open question is what exactly should be tested: most changes just add a case to a pattern match-alike statement, or copy a file and modify it for the type in question. Possibly the code should be changed to not require file copies, and types can just follow a protocol; then tests can just focus on the implementation of the protocol, and it would be clear what file(s) to add unit tests for.

Probably match-alike statements should ~always only dispatch to a method call.

So this will go into some potential rework of the code structure, which may also be useful for predicate based testing.

@pflanze pflanze self-assigned this Apr 11, 2024
@fengelniederhammer
Copy link
Contributor

@pflanze
Copy link
Contributor Author

pflanze commented Apr 12, 2024

II had looked at the other tests, but my reservations are these:

  • only about one commit out of the PR adds something really new, which is the OptionalBool type; but that is so straightforward that I'm not sure it warrants any test (and it wouldn't use the json based testing)

  • the other commits are copy-paste-and-adapt coding; copying whole files and changing them to suit the few changes seems OK to me as a get-running quickly, but writing separate tests for each of those copies does strike me as going overboard--it's solidifying the hack. These copies should probably either be instantiations of templates, that are calling methods on other classes, or code that calls virtual methods (depending on whether the first approach really yields a performance benefit); that's what I meant with protocol (protocols in the sense of the more universal meaning for interfaces / abstract base classes).

  • I have thoughts on introducing predicate based testing; predicates describe what we actually test (they literally describe the invariants we actually care for), and allow for randomized tests / fuzzing to reach edge cases we may not be thinking about with just a handful of tests, especially when using a profile guided fuzzer. It will be good to start using the latter for detecting security issues, too, anyway (and even though silo may not have sensitive data, if it's running on the same machine (even via docker) it may be used via privilege escalation kernel bugs to reach the containers that do).

This is why I think it's good to hold off work on unit tests here; I didn't see an obvious need while coding (still don't see it) and see having a better perspective in the future.

(If you want the unit tests for debuggability while coding, maybe we could still repurpose the e2e tests to access the binary directly? Maybe let's discuss in video next week. Maybe I'll also see reasons myself as I work on the next issues.)

@fengelniederhammer
Copy link
Contributor

Ok, maybe let's call the tests that I have in mind "component tests".

  • most of the query engine in SILO is not testable with unit tests (in the traditional sense - testing exactly a single class/function, ideally with not other dependencies).
  • testing the query engine as a whole is more valuable IMIO anyway.
  • The query engine has a lot of special logic and edge cases that should actually be verified
    • The boolean columns are a very simple case with few edge cases.

IMO the responsibilities of the tests could be:

  • end-to-end tests:
    • documentation/demonstration (easy to understand for users)
    • testing the HTTP interface
    • rough verification of the logic
      • The scope that I have in mind: sending every sort of query once or twice to cover the most important happy paths.
  • component tests:
    • fast feedback
    • verification of the logic (as complete as reasonably possible)
      • trying to cover every major execution path in the queries (ranging from essentially a single path for boolean columns to many)
      • testing error cases (not possible yet)
    • full control over the data
      • Reducing the test data to the absolute minimum that is necessary for the individual test: Stripping all unnecessary columns, using a minimal genome, very few rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants