chore(regtest): Optimize tests a little #5809

dranikpg · 2025-09-15T21:01:30Z

Ways to optimize regression tests:

Don't use absolute time for relative ordering

Tests mostly are a series of dependent actions of different kinds. They order is relative - so should be the synchronization of those, yet we often use sleeps. The simplest and most robust options is to busy loop on the condition to detect it as early as possible. Lots of utilities were written for this: wait_for_replicas_state, check_all_replicas_finished, etc.

# Start some work
# Instead of sleep(20 seconds)
while (await client.info())['my_metric'] < my_target:
    await asyncio.sleep(0.1)
# Assert work

Additionally, the test is more adaptive to different configurations. Values defined on a faster local machine often turn out to be too low, so they fail on CI and have to be increased. If we switch to faster runners, we'll have to review them all over.

When designing functions, try to make them reliable and easy to use with relative timings. For example, df_server.start() already waits for the server to become online. This is an obvious error taken from a real test:

df_server.start()
time.sleep(1) # Give the server time to start

Try to extend existing tests

There are a few dozen replication tests, many specific ones akin test_this_when_this_and_also_this, sometimes duplicating themselves under different conditions. I assume that in many cases, the functionality can either be checked within a generic test or multiple tests can be united into a single one. For example, if just dbsize, memory usage, stats, etc were handled incorrectly, we can assert it within one of the generic tests. Some simple scenarios happen there as well, like test_replicaof_does_not_flush_if_it_fails_to_connect - if I'm not mistaken any of rotating_replicas or masters has failing connections.

For example, test_replicaof_flag_disconnect is literally the same as test_replicaof_flag except for two lines at the very end. All that was needed was to add those two lines at the end to the base test.

Prefer to re-use as much resources as possible

Classes can be used to create separate per-class fixtures and share them between all nested tests. Additionally, the default df_server fixture has a per-class scope, so the instance is not recreated every time.

For example, all memcached tests were grouped into a single class. Now they re-use the same instance.

@dfly_args({...})
class TestBasic:
    @pytest.fixture(scope="function")
    def memcached_client(self, df_server: DflyInstance):
        client = pymemcache.Client(f"127.0.0.1:{df_server.mc_port}", default_noreply=False)
        yield client
        client.flush_all()

    def test_getset(self, memcached_client: MCClient):
        pass

    async def test_noreply_pipeline(self, df_server: DflyInstance, memcached_client: MCClient):
        pass

Starting Dragonfly with all the management code around takes around 100 msecs, so around a second for every 10 tests. It's not much really, but it will simplify running tests in parallel as it's more explicit about dependencies.

A more advanced example is grouping simple 1-to-1 replication tests into a single class. They all share the same master-replica pair.

@dfly_args({"proactor_threads": 2})
class Test1To1Integrity:
    """
    Test class for tests that use a single replica-master connection without any specific options.
    Fits best for testing integrity of simple values and behaviour.
    """

    @pytest.fixture(scope="class")
    def master(self, df_factory): # Similarly, replica is defined
        return df_factory.create(memcached_port=11211)

    @pytest.fixture(scope="class")
    def instances(self, df_factory, master, replica):
        df_factory.start_all([master, replica])

    @pytest.fixture(scope="function")
    async def do_replicate(self, instances, replica, master):
        c_replica = replica.client()

        async def connect():
            await c_replica.execute_command(f"REPLICAOF localhost {master.port}")
            await wait_available_async(c_replica)

        yield connect

        await c_replica.execute_command("REPLICAOF NO ONE")
        await asyncio.gather(*(master.client().flushall(), c_replica.flushall()))

    async def test_something(self, master, replica, do_replicate):
        # seed data to master
        await do_replicate()
        # assert state after replication

Similarly, classes can be defined for tests with the same set of predefined data, etc...

Remove tests that could've been a unit test

There are some of those tests, that assert simple logical invariants on a single instance. They could've been a unit test! Sometimes, they just duplicate those, so the test is not needed at all.

What should be a python test?

Launching multiple instances, i.e. replication, cluster, etc
Testing connections, sockets, client libraries, etc - they're all mocked in unit tests
Doing heavy tasks (snapshotting, measuring memory), so execution time is irrelevant
Working with complicated objects (like json) or algorithms (like fuzzy tests), it's a pain from unit tests

This won't shave off much time, but it will help to decreaes the overall number of tests / LOC, to make them more overseeable.

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

chore(regtest): Optimize tests a little

d395f7c

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(regtest): Optimize tests a little #5809

chore(regtest): Optimize tests a little #5809

Uh oh!

dranikpg commented Sep 15, 2025

Uh oh!

Uh oh!

chore(regtest): Optimize tests a little #5809

Are you sure you want to change the base?

chore(regtest): Optimize tests a little #5809

Uh oh!

Conversation

dranikpg commented Sep 15, 2025

Don't use absolute time for relative ordering

Try to extend existing tests

Prefer to re-use as much resources as possible

Remove tests that could've been a unit test

Uh oh!

Uh oh!