Test Versi Scaling Limit #5962

eskimor · 2022-09-02T07:58:22Z

To get an idea what could be feasible on Kusama.

Test:

250 para validators + 50 parachains.
300 para validators + 60 parachains.

See where block times start to suffer, also check what difference it makes when validator set is scaled up to Kusama size as well. If block times for 250 para validators is still good, but not for 300 - check whether 250 is also good if there are 900 authorities in total.

Depending on the result, other experiments might be worthwhile as well:

300 para validators with only 50 parachains for example: This makes sense to test with non trivial POVs, as a higher number of para validators with same number of parachains reduces load on approval checkers - this will only be noticeable if there actually is any load (candidates perform some computation).

ggwpez · 2022-09-02T09:05:27Z

How will you test this? With zombienet?

Are you just testing the network size or also performance (transactions per second)?

eskimor · 2022-09-02T11:47:38Z

On Versi - our test net. The most important performance metric we will be monitoring is block rate of parachains. Especially for the second part, we might also look into transactions per second.

sandreim · 2022-09-13T07:34:52Z

I think we can easily deploy polkadot-introspector for block time monitoring in the CI test, but it would only work with cumulus collators as it requires us to connect to collators via RPC and get inherent TS (which doesn't work for cumulus). However adding Prometheus mode for parachain commander would solve that as we would compute the para block times from the relaychain data.

sandreim · 2022-09-13T07:36:41Z

this is also overlapping with paritytech/polkadot-sdk#874

eskimor · 2022-10-20T11:30:57Z

@tdimitrov could you have a look into this one as well, as soon as Versi is operational again please?

ggwpez · 2022-10-20T12:01:49Z

How does this relate to https://github.com/paritytech/polkadot-stps/ ?
Do you want to produce some official sTPS numbers or just get an idea?

sandreim · 2022-10-20T12:50:36Z

In the context of this issue we want to identify the performance bottlenecks so we have a high level idea of where to dig further and iterate on optimizing the node performance wrt parachain consensus.

eskimor · 2022-10-20T13:01:29Z

How does this relate to https://github.com/paritytech/polkadot-stps/ ? Do you want to produce some official sTPS numbers or just get an idea?

Thanks for the pointer to polkadot-stps , I am not interested in official numbers at this point, mostly I want to know what we are able to do right now. So if someone comes (and someone will come) and ask, whether we can scale up Kusama some more in the number of para validators and parachains, I can tell them whether we can or not.

tdimitrov · 2022-10-28T12:42:41Z

I ran 250 validators with 50 parachains on versi.

Block times on relay chain and parachains looked generally okay. There were occasional 12/18/24 sec block times on some parachains.
On the next day one parachain has stalled but there was an issue with the database on it so I assume this was unrelated.

The new parachains were onboarded between 16:30 and 17:30` (timestamps on the graphs below).

I noticed approval-distribution channel size increased slightly:

There were also some peaks with approval-voting and bitdield-distribution subsystems channel size:

ToFs for approval-distribution were increased:

ToFs of approval-voting and bitfield-distribution increased a little too:

eskimor · 2022-10-28T15:12:53Z

Nice, thank you Tsveto! So this looks like we are already pushing limits here. I would be very interested how this behaves with extending this to more validators. I have seen in the past that approval-distribution does not seem to like an increased validator set, would be good to confirm or disprove this. So in particular:

Increase the validator set even further, let's say 100 more nodes - see how ToFs and channel sizes behave.

rphmeier · 2022-10-31T17:10:03Z

I find the apparent bimodal distribution of Approval-Distribution ToFs quite curious; that is, the gap between messages that arrive instantly and those which are delayed. It makes me wonder if it's blocking somewhere it shouldn't be, for instance on approval-voting DB writes.

The heatmap is also clearly on an exponential scale, so the orange/red message volumes are much higher than the blue/green. It still looks as though a majority of messages are handled almost instantaneously.

With the exponential scale in mind, it doesn't seem like approval-voting or bitfield-distribution is bottlenecking at all. While there are a few outliers that take a long time, for the most part all messages are processed instantaneously.

paritytech/polkadot-sdk#841 was a hypothesis in the past, but @sandreim investigated back then and didn't see convincing evidence that it was a cause.

If it is true that approval-distribution is not waiting on the approval-voting subsystem, then it may just be waiting on the network-bridge. Network-bridge ToFs would be interesting to look at as well. Reputation handling clogging up the queue is a potential concern.

sandreim · 2022-11-01T16:04:58Z

Looks like approval-voting will block quite a lot when sending messages to approval-distribution as seen in this chart, other than this one, there isn't anything blocking on the same scale that I can see using this metric

https://grafana.parity-mgmt.parity.io/goto/73iMs9N4z?orgId=1

rphmeier · 2022-11-02T04:04:30Z

questions on my mind:

how are the slow ToFs distributed across machines? i.e. do we see roughly the same ToF graph on all machines or are a few of them slower than others?
what is the bottleneck in approval-distribution? are messages not being processed because of work being done in the approval-distribution subsystem or because it's waiting on something else, which is slow?
since it only communicates with the network bridge and approval-voting, is network-bridge the culprit?
if so, why?
How does the behavior change as we scale Versi?

eskimor · 2022-11-02T10:37:23Z

Where does approval-voting actually block on approval-distribution. I only see it sending messages via unbounded to approval distribution.

eskimor · 2022-11-02T10:39:13Z

questions on my mind:

* how are the slow ToFs distributed across machines? i.e. do we see roughly the same ToF graph on all machines or are a few of them slower than others?

* what is the bottleneck in approval-distribution? are messages not being processed because of work being done in the approval-distribution subsystem or because it's waiting on something else, which is slow?

* since it only communicates with the network bridge and approval-voting, is network-bridge the culprit?

* if so, why?

* How does the behavior change as we scale Versi?

We identified a bottleneck. I would expect fixing that to bump up performance at least threefold.

sandreim · 2022-11-02T10:39:37Z

questions on my mind:

how are the slow ToFs distributed across machines? i.e. do we see roughly the same ToF graph on all machines or are a few of them slower than others?

Things look similar across machines.

what is the bottleneck in approval-distribution? are messages not being processed because of work being done in the approval-distribution subsystem or because it's waiting on something else, which is slow?

The issue we suspect is causing most pain is the importing of assignments and approvals which are serialized and both wait for approval voting subsystem checks before doing book keeping. I'm working on some changes to wait for the approval voting checks in parallel but still serialising work per peer so to not break deduplication. We would expect approval voting to have more work to do in this scenario and for it no not block when sending messages to approval-distribution as we'll clear the queue faster there.

since it only communicates with the network bridge and approval-voting, is network-bridge the culprit?

network-bridge looks fine, very few instances where we'd block on its queue being full.

if so, why?

How does the behavior change as we scale Versi?

Currently we are blocked by a deployment/networking issue. We can't even get to 200 validators because of low connectivity (authority discovery failures).

sandreim · 2022-11-02T10:43:10Z

Where does approval-voting actually block on approval-distribution. I only see it sending messages via unbounded to approval distribution.

polkadot/node/core/approval-voting/src/lib.rs

Line 1008 in 0bc8fdd

ctx.send_messages(messages.into_iter()).await;

eskimor · 2022-11-02T10:54:49Z

Darn 🥇 ... yep, I did not see that one 🙈

rphmeier · 2022-11-03T00:00:47Z

polkadot/node/core/approval-voting/src/lib.rs

Line 1008 in 0bc8fdd

ctx.send_messages(messages.into_iter()).await;

It's worth noting that the BecomeActive logic only gets triggered once during a node's lifetime - that is, when it first gets into sync. It should be unbounded but this isn't going to impact long-running performance.

The issue we suspect is causing most pain is the importing of assignments and approvals which are serialized and both wait for approval voting subsystem checks before doing book keeping. I'm working on some changes to wait for the approval voting checks in parallel but still serialising work per peer so to not break deduplication. We would expect approval voting to have more work to do in this scenario and for it no not block when sending messages to approval-distribution as we'll clear the queue faster there.

Ok, I hope it is indeed the bottleneck. My understanding was that we only send assignments or approvals over to approval-voting the first time we receive them, so only a minority of incoming messages actually trigger that code path. It would explain the bimodal distribution of ToFs we see, but my expectation was that anything waiting on approval-voting would be bottlenecked on the DB write, and we disproved that in the past, didn't we? Otherwise, the only work that approval-voting does is verify a signature (not insignificant, but would be a surprising bottleneck at these message volumes), update some in-memory state, and do some DB reads (which should be cached by RocksDB, no?)

sandreim · 2022-11-03T14:47:36Z

but my expectation was that anything waiting on approval-voting would be bottlenecked on the DB write, and we disproved that in the past, didn't we? Otherwise, the only work that approval-voting does is verify a signature (not insignificant, but would be a surprising bottleneck at these message volumes), update some in-memory state, and do some DB reads (which should be cached by RocksDB, no?)

Yes, we disproved the DB as being the bottleneck in the past.

sandreim · 2023-01-23T12:58:13Z

We concluded last round of testing at 350 paravalidators and 60 parachains with PR #6530.

Finality lag:

Parachain block times:

Board https://github.com/orgs/paritytech/projects/63 for tracking.

eskimor added the T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. label Sep 2, 2022

eskimor added this to Parachains-core Sep 2, 2022

eskimor moved this to To do in Parachains-core Sep 2, 2022

eskimor added the Q2-easy label Sep 13, 2022

tdimitrov self-assigned this Oct 20, 2022

tdimitrov moved this from To do to In progress in Parachains-core Oct 28, 2022

sandreim mentioned this issue Nov 7, 2022

[DNM] approval-distribution: process assignments and votes in parallel #6247

Closed

eskimor added this to Road to 1k paravalidators and 200 parachains Dec 5, 2022

eskimor moved this to Done in Road to 1k paravalidators and 200 parachains Dec 5, 2022

eskimor removed this from Parachains-core Dec 5, 2022

the-right-joyce removed the Q2-easy label Jan 20, 2023

sandreim closed this as completed Jan 23, 2023

This was referenced Jun 2, 2023

Network scalability: 300 paravalidators and 70 parachains paritytech/roadmap#25

Closed

Network scalability: 500 parachain validators and 100 cores (async backing enabled) paritytech/roadmap#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Versi Scaling Limit #5962

Test Versi Scaling Limit #5962

eskimor commented Sep 2, 2022

ggwpez commented Sep 2, 2022

eskimor commented Sep 2, 2022

sandreim commented Sep 13, 2022 •

edited

Loading

sandreim commented Sep 13, 2022

eskimor commented Oct 20, 2022

ggwpez commented Oct 20, 2022

sandreim commented Oct 20, 2022

eskimor commented Oct 20, 2022

tdimitrov commented Oct 28, 2022 •

edited

Loading

eskimor commented Oct 28, 2022

rphmeier commented Oct 31, 2022 •

edited

Loading

sandreim commented Nov 1, 2022 •

edited

Loading

rphmeier commented Nov 2, 2022

eskimor commented Nov 2, 2022

eskimor commented Nov 2, 2022

sandreim commented Nov 2, 2022 •

edited

Loading

sandreim commented Nov 2, 2022

eskimor commented Nov 2, 2022

rphmeier commented Nov 3, 2022 •

edited

Loading

sandreim commented Nov 3, 2022

sandreim commented Jan 23, 2023

Test Versi Scaling Limit #5962

Test Versi Scaling Limit #5962

Comments

eskimor commented Sep 2, 2022

ggwpez commented Sep 2, 2022

eskimor commented Sep 2, 2022

sandreim commented Sep 13, 2022 • edited Loading

sandreim commented Sep 13, 2022

eskimor commented Oct 20, 2022

ggwpez commented Oct 20, 2022

sandreim commented Oct 20, 2022

eskimor commented Oct 20, 2022

tdimitrov commented Oct 28, 2022 • edited Loading

eskimor commented Oct 28, 2022

rphmeier commented Oct 31, 2022 • edited Loading

sandreim commented Nov 1, 2022 • edited Loading

rphmeier commented Nov 2, 2022

eskimor commented Nov 2, 2022

eskimor commented Nov 2, 2022

sandreim commented Nov 2, 2022 • edited Loading

sandreim commented Nov 2, 2022

eskimor commented Nov 2, 2022

rphmeier commented Nov 3, 2022 • edited Loading

sandreim commented Nov 3, 2022

sandreim commented Jan 23, 2023

sandreim commented Sep 13, 2022 •

edited

Loading

tdimitrov commented Oct 28, 2022 •

edited

Loading

rphmeier commented Oct 31, 2022 •

edited

Loading

sandreim commented Nov 1, 2022 •

edited

Loading

sandreim commented Nov 2, 2022 •

edited

Loading

rphmeier commented Nov 3, 2022 •

edited

Loading