Waku Network Testing #142

jm-clius · 2022-10-18T15:31:31Z

Background

Waku network simulation and testing is done in collaboration with Kurtosis and uses their distributed network manipulation tools. The main aim of network testing is twofold:

benchmark Waku performance as it scales
test Waku network robustness against adverse and difficult network conditions

The side effect is the creation of an integrated, multi-client simulation environment that can be used for regular integration testing, regression checks, network experiments, etc.

This issue tracks the main simulation scenarios we want to create and desirable outputs/metrics.

NOTE: This issue is WIP and will be adapted as more requirements from platforms materialise and as we familiarise ourselves with Kurtosis features.

First network test: `relay` scalability and familiarising ourselves with Kurtosis tools

The rough idea for a first test comes from this forum thread. It asks at least two questions about the simplest possible network setup, running only the main routing protocol (relay) and using only the nwaku client:

at exactly what messaging load performance, latency and reliability starts to degrade or fall below an acceptable threshold
at what network sizes performance, latency and reliability starts to degrade or fall below an acceptable threshold

Other metrics (such as node resource consumption - CPU, bandwidth and memory usage) should form part of the experimental output data.

Setup

Run a relay network with x amount of nwaku nodes using discv5 to establish a well-connected mesh (afaik this approximates a typical Communities setup on Desktop clients).
Publish (from a random subset of nodes) messages of size s at a rate r.

Measure

Varying x, s and r, measure:

message delivery reliability
message delivery latency

To measure reliability we can add a simple seq counter to the message payload. Latency can already be measured using the sender timestamp field.

Laundry list of scenarios we want to build

The scenario above proposes a way to get started with Kurtosis for network testing. The list below, captures more clearly the type of outputs that we want from network testing in the longer term:

on top of relay, also testing store, filter, lightpush at scale
integrated network consisting of go-waku, js-waku and nwaku nodes
on top of discv5, alternative discovery methods such as waku peer exchange
simulate network latency
simulate % of node churn (nodes going online/offline)
include browser, iOS, Android nodes in simulation

It would be useful to come up with a set of automated e2e tests that can regularly run against an integrated simulated environment to detect regressions and performance issue early.

Laundry list of useful metrics

Metrics that will be useful to get from every scenario:

message delivery reliability
message propagation latency
CPU usage
bandwidth usage
memory usage
node response times (for request-response protocols, such as filter, lightpush and store)

The text was updated successfully, but these errors were encountered:

fryorcraken · 2022-10-26T00:17:16Z

The side effect is the creation of an integrated, multi-client simulation environment that can be used for regular integration testing, regression checks, network experiments, etc.

I would call it a phase 2 more than a side effect.

I agree that priority is on multi-client scalability of Waku, followed by extracting performance metrics.

Phase 2 tests

What we would also like to have mid-term.

Non-regression testing: delta analysis

What we also need is various non-regression tests when (before) releasing new software. Where we do a run and compare with a previous run:

Metrics: Extract response times, CPU Usage, etc (see metrics laundry list above). Any performance degradation? If so, investigate. If performance improvement, good to note and highlight.
Errors: Check logs across all clients, any errors returned? Any new errors returned?

Multi-client functional testing

What we also need is a multi-client non-regression functional testing. ie, suite of functional tests that demonstrate cross client compatibility of several protocols. Similar to what is done in js-waku CI: [1] [2].

Does this fall under the Kurtosis umbrella?

fryorcraken · 2022-10-26T00:39:01Z

Regarding phase 1, proving scalability. I suggest we focus on the Status client use case at first which enables to cover a good number of scenarios.
This would also help us focus on answering questions relevant to a specific use cases and have some idea on what should be the scales of x, s and r.

We can structure the tests around answering this question: Can the Waku network onboard 10 million users?

We can start progressively by dialing n: 1,000, 10,000, 1mil, 10mil users.
We can also identify a cap for a simulated environment and extrapolate from there (.e.g. maybe 10,000 is the max we can do with Kurtosis?).

We can start with topology similar to the expected one from Status clients:

go-waku nodes (mobile using filter/light push/store tcp transport and desktop using relay, providing store)
js-waku nodes (web using filter/light push/store websockets)
nwaku as service node (including store).

Where # of go-waku nodes + # of js-waku nodes = n

From this simulation, we can extract/extrapolate:

How many nwaku (service node) do we need to run to service n users.
What performances (connection response times, messages propagation, etc) are we getting?

jm-clius · 2022-10-26T08:03:35Z

Issue moved to waku-org/pm#2

jm-clius mentioned this issue Oct 26, 2022

Waku Network Testing waku-org/pm#2

Open

6 tasks

jm-clius closed this as completed Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waku Network Testing #142

Waku Network Testing #142

jm-clius commented Oct 18, 2022

fryorcraken commented Oct 26, 2022 •

edited

Loading

fryorcraken commented Oct 26, 2022

jm-clius commented Oct 26, 2022

Waku Network Testing #142

Waku Network Testing #142

Comments

jm-clius commented Oct 18, 2022

Background

First network test: relay scalability and familiarising ourselves with Kurtosis tools

Setup

Measure

Laundry list of scenarios we want to build

Laundry list of useful metrics

fryorcraken commented Oct 26, 2022 • edited Loading

Phase 2 tests

Non-regression testing: delta analysis

Multi-client functional testing

fryorcraken commented Oct 26, 2022

jm-clius commented Oct 26, 2022

First network test: `relay` scalability and familiarising ourselves with Kurtosis tools

fryorcraken commented Oct 26, 2022 •

edited

Loading