chore(regtest): Optimize tests a little #5809
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ways to optimize regression tests:
Don't use absolute time for relative ordering
Tests mostly are a series of dependent actions of different kinds. They order is relative - so should be the synchronization of those, yet we often use sleeps. The simplest and most robust options is to busy loop on the condition to detect it as early as possible. Lots of utilities were written for this:
wait_for_replicas_state
,check_all_replicas_finished
, etc.Additionally, the test is more adaptive to different configurations. Values defined on a faster local machine often turn out to be too low, so they fail on CI and have to be increased. If we switch to faster runners, we'll have to review them all over.
When designing functions, try to make them reliable and easy to use with relative timings. For example,
df_server.start()
already waits for the server to become online. This is an obvious error taken from a real test:Try to extend existing tests
There are a few dozen replication tests, many specific ones akin
test_this_when_this_and_also_this
, sometimes duplicating themselves under different conditions. I assume that in many cases, the functionality can either be checked within a generic test or multiple tests can be united into a single one. For example, if just dbsize, memory usage, stats, etc were handled incorrectly, we can assert it within one of the generic tests. Some simple scenarios happen there as well, liketest_replicaof_does_not_flush_if_it_fails_to_connect
- if I'm not mistaken any of rotating_replicas or masters has failing connections.For example,
test_replicaof_flag_disconnect
is literally the same astest_replicaof_flag
except for two lines at the very end. All that was needed was to add those two lines at the end to the base test.Prefer to re-use as much resources as possible
Classes can be used to create separate per-class fixtures and share them between all nested tests. Additionally, the default df_server fixture has a per-class scope, so the instance is not recreated every time.
For example, all memcached tests were grouped into a single class. Now they re-use the same instance.
Starting Dragonfly with all the management code around takes around 100 msecs, so around a second for every 10 tests. It's not much really, but it will simplify running tests in parallel as it's more explicit about dependencies.
A more advanced example is grouping simple 1-to-1 replication tests into a single class. They all share the same master-replica pair.
Similarly, classes can be defined for tests with the same set of predefined data, etc...
Remove tests that could've been a unit test
There are some of those tests, that assert simple logical invariants on a single instance. They could've been a unit test! Sometimes, they just duplicate those, so the test is not needed at all.
What should be a python test?
This won't shave off much time, but it will help to decreaes the overall number of tests / LOC, to make them more overseeable.