Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Get rid of Peerset compatibility layer #14337

Merged
merged 44 commits into from
Aug 2, 2023

Conversation

dmitry-markin
Copy link
Contributor

@dmitry-markin dmitry-markin commented Jun 9, 2023

Eliminate Peerset completely, giving handles to PeerStore & ProtocolController to the components that require them instead. Resolves #14170.

This PR also fixes a bug in Protocol with peer counting: now it made clear that self.peers contains only peers on the default (sync) protocol, and the entries are not removed when peers are disconnected on non-default protocols.

Added B1-note_worthy because Peerset is completely replaced by PeerStore & ProtocolController.

polkadot companion: paritytech/polkadot#7355
cumulus companion: paritytech/cumulus#2739

@dmitry-markin dmitry-markin added A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit B1-note_worthy Changes should be noted in the release notes T0-node This PR/Issue is related to the topic “node”. labels Jun 9, 2023
alindima added a commit to paritytech/polkadot that referenced this pull request Jul 7, 2023
This substrate branch is a rebase of paritytech/substrate#14337
over master.

Signed-off-by: alindima <alin@parity.io>
@alindima
Copy link
Contributor

@dmitry-markin I integrated this in polkadot for testing purposes. Here's the branch I used, since I needed to rebase it: https://github.com/paritytech/substrate/tree/alindima-get-rid-of-peerset-stub

I tested it in Versi and ran into a weird issue, where the node would print out many errors like:
Protocol controllers receiver stream has returned None

and then it looks like the node restarted. Here are the full logs: https://grafana.parity-mgmt.parity.io/goto/2NEzJ7j4g?orgId=1.

Are these errors expected under normal usage?

@dmitry-markin
Copy link
Contributor Author

@dmitry-markin I integrated this in polkadot for testing purposes. Here's the branch I used, since I needed to rebase it: https://github.com/paritytech/substrate/tree/alindima-get-rid-of-peerset-stub

I tested it in Versi and ran into a weird issue, where the node would print out many errors like: Protocol controllers receiver stream has returned None

and then it looks like the node restarted. Here are the full logs: https://grafana.parity-mgmt.parity.io/goto/2NEzJ7j4g?orgId=1.

Are these errors expected under normal usage?

Thanks for testing this! There are currently issues with the slot allocation and connection management mechanism that need to be addressed first, so this PR is blocked. I'll look into the errors reported once I get back to it.

@dmitry-markin
Copy link
Contributor Author

Protocol controllers receiver stream has returned None

@alindima Is it possible that the node was externally restarted? These errors might happen if the node components were already shutting down, but I found no indication in the logs for why would the node shut down.

@alindima
Copy link
Contributor

@alindima Is it possible that the node was externally restarted? These errors might happen if the node components were already shutting down, but I found no indication in the logs for why would the node shut down.

This could be it, I started noticing random restarts in other versi nodes as well. I'll ask around to make sure that others noticed this as well

@dmitry-markin
Copy link
Contributor Author

@alindima Could you try burning in again? Here is the burn-in PR for polkadot debug docker image: paritytech/polkadot#7533.

@dmitry-markin
Copy link
Contributor Author

I've converted the PR into draft, so we remember to fix the flaky test before merging. The test in question is syncs_header_only_forks and it gets stuck once in a while on CI. As discovered by @altonen, it gets stuck here. For some reason it doesn't get stuck if the test runtime is converted to single-threaded.

I've created an issue paritytech/polkadot-sdk#497 to fix this later.

@dmitry-markin
Copy link
Contributor Author

Versi burn-in showed no impact on block height, peer count, CPU usage. So, IMO, this PR is ready for merging. CC @altonen

@dmitry-markin dmitry-markin marked this pull request as ready for review July 26, 2023 08:30
@dmitry-markin
Copy link
Contributor Author

@altonen can we merge this?

@altonen
Copy link
Contributor

altonen commented Aug 2, 2023

@altonen can we merge this?

yes we can merge it

@dmitry-markin
Copy link
Contributor Author

bot merge

@paritytech-processbot
Copy link

Error: Github API says paritytech/polkadot#7355 is not mergeable

@dmitry-markin
Copy link
Contributor Author

bot merge

@paritytech-processbot
Copy link

Error: Github API says paritytech/cumulus#2739 is not mergeable

@dmitry-markin
Copy link
Contributor Author

bot merge

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B1-note_worthy Changes should be noted in the release notes C1-low PR touches the given topic and has a low impact on builders. D2-breaksapi D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit T0-node This PR/Issue is related to the topic “node”.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Get rid of Peerset stub and move ProtocolController to protocols
5 participants