Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to have several connections to core from shard #42

Closed
wants to merge 2 commits into from

Conversation

i1i1
Copy link

@i1i1 i1i1 commented Sep 29, 2022

The only thing which can be a bottleneck in a shard (like a processing bottleneck) is sending data via shard message aggregator. This pr wraps several ws connection (which are now AggregatorInternal) in Aggregator structure.

@i1i1 i1i1 requested review from nazar-pc and isSerge September 29, 2022 17:21
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting hack, but I don't think it is worth it, we can with the same success just run multiple instances with no code changes.

backend/telemetry_shard/Cargo.toml Outdated Show resolved Hide resolved
@i1i1
Copy link
Author

i1i1 commented Sep 30, 2022

This is an interesting hack, but I don't think it is worth it, we can with the same success just run multiple instances with no code changes.

Let's try that and see if it would help. We deployed 2 local shards and that didn't help much (apart from memory issues). If this change doesn't help, we can just revert it.

Currently, for each ws connection shard spawns like 2 tasks, while aggregator has only 1 task. So I just believe that it is not fairly scheduled and that is why we leak memory, but that might not be the truth.

@i1i1 i1i1 force-pushed the several-aggregators branch from 15cd1f9 to 205392d Compare September 30, 2022 12:36
@nazar-pc
Copy link
Member

We deployed 2 local shards and that didn't help much (apart from memory issues).

Didn't help with what? I thought that memory usage was the last issue we had. There were some Nginx errors that I tweaked by replacing localhost with 127.0.0.1.

@i1i1 i1i1 requested a review from nazar-pc September 30, 2022 12:37
@i1i1
Copy link
Author

i1i1 commented Sep 30, 2022

Didn't help with what? I thought that memory usage was the last issue we had. There were some Nginx errors that I tweaked by replacing localhost with 127.0.0.1.

I just thought that node count is actually larger than 15k, so I thought the issue was with shards deployment.

@nazar-pc
Copy link
Member

I have not seen any errors sending data to telemetry on my side and no errors in logs, so I assume that is not the case. As we have 3 shards right now, so if there was an issue, it should have been in logs somewhere I think except if it is before reaching shard, in which case this PR will make no difference either.

@i1i1
Copy link
Author

i1i1 commented Sep 30, 2022

Okay, so can I close #38 then?

@nazar-pc
Copy link
Member

Well, I think we can close it and open upstream issue to remove bottleneck there in whichever way they prefer. You can also link this PR as one of the examples of what can be done.

@nazar-pc
Copy link
Member

Long-term we need a completely different telemetry implementation anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants