Add possibility to have several connections to core from shard #42

i1i1 · 2022-09-29T17:21:16Z

The only thing which can be a bottleneck in a shard (like a processing bottleneck) is sending data via shard message aggregator. This pr wraps several ws connection (which are now AggregatorInternal) in Aggregator structure.

nazar-pc

This is an interesting hack, but I don't think it is worth it, we can with the same success just run multiple instances with no code changes.

backend/telemetry_shard/Cargo.toml

i1i1 · 2022-09-30T12:35:14Z

This is an interesting hack, but I don't think it is worth it, we can with the same success just run multiple instances with no code changes.

Let's try that and see if it would help. We deployed 2 local shards and that didn't help much (apart from memory issues). If this change doesn't help, we can just revert it.

Currently, for each ws connection shard spawns like 2 tasks, while aggregator has only 1 task. So I just believe that it is not fairly scheduled and that is why we leak memory, but that might not be the truth.

nazar-pc · 2022-09-30T12:37:36Z

We deployed 2 local shards and that didn't help much (apart from memory issues).

Didn't help with what? I thought that memory usage was the last issue we had. There were some Nginx errors that I tweaked by replacing localhost with 127.0.0.1.

i1i1 · 2022-09-30T12:39:16Z

Didn't help with what? I thought that memory usage was the last issue we had. There were some Nginx errors that I tweaked by replacing localhost with 127.0.0.1.

I just thought that node count is actually larger than 15k, so I thought the issue was with shards deployment.

nazar-pc · 2022-09-30T12:42:50Z

I have not seen any errors sending data to telemetry on my side and no errors in logs, so I assume that is not the case. As we have 3 shards right now, so if there was an issue, it should have been in logs somewhere I think except if it is before reaching shard, in which case this PR will make no difference either.

i1i1 · 2022-09-30T14:00:25Z

Okay, so can I close #38 then?

nazar-pc · 2022-09-30T14:02:13Z

Well, I think we can close it and open upstream issue to remove bottleneck there in whichever way they prefer. You can also link this PR as one of the examples of what can be done.

nazar-pc · 2022-09-30T14:02:48Z

Long-term we need a completely different telemetry implementation anyway.

Add multiple aggregators

3e71a4c

i1i1 requested review from nazar-pc and isSerge September 29, 2022 17:21

nazar-pc reviewed Sep 29, 2022

View reviewed changes

backend/telemetry_shard/Cargo.toml Outdated Show resolved Hide resolved

Remove second shard

205392d

i1i1 force-pushed the several-aggregators branch from 15cd1f9 to 205392d Compare September 30, 2022 12:36

i1i1 requested a review from nazar-pc September 30, 2022 12:37

i1i1 mentioned this pull request Sep 30, 2022

Telemetry shard can be overwhelmed by too many connections paritytech/substrate-telemetry#502

Open

i1i1 closed this Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add possibility to have several connections to core from shard #42

Add possibility to have several connections to core from shard #42

i1i1 commented Sep 29, 2022

nazar-pc left a comment

i1i1 commented Sep 30, 2022

nazar-pc commented Sep 30, 2022

i1i1 commented Sep 30, 2022

nazar-pc commented Sep 30, 2022

i1i1 commented Sep 30, 2022

nazar-pc commented Sep 30, 2022

nazar-pc commented Sep 30, 2022

Add possibility to have several connections to core from shard #42

Add possibility to have several connections to core from shard #42

Conversation

i1i1 commented Sep 29, 2022

nazar-pc left a comment

Choose a reason for hiding this comment

i1i1 commented Sep 30, 2022

nazar-pc commented Sep 30, 2022

i1i1 commented Sep 30, 2022

nazar-pc commented Sep 30, 2022

i1i1 commented Sep 30, 2022

nazar-pc commented Sep 30, 2022

nazar-pc commented Sep 30, 2022