Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Remove legacy single substream tracking issue #5670

Closed
5 of 6 tasks
tomaka opened this issue Apr 16, 2020 · 4 comments · Fixed by #8296
Closed
5 of 6 tasks

Remove legacy single substream tracking issue #5670

tomaka opened this issue Apr 16, 2020 · 4 comments · Fixed by #8296
Assignees
Labels
J0-enhancement An additional feature request.

Comments

@tomaka
Copy link
Contributor

tomaka commented Apr 16, 2020

Here's a tracking issue for things to do before removing the legacy single substream:

@tomaka
Copy link
Contributor Author

tomaka commented Apr 21, 2020

It would be nice to get this in before the Polkadot launch, especially because we're very close to doing this.

I'm going to try do the first four items before Friday evening.

Then we can do the (2) during the beta phase, as soon as a Polkadot version has been released. This requires code changes that are non-trivial and that could introduce issues, but not more than any other code change.

And then we can remove the legacy substream any time afterwards, even after the Polkadot launch. As soon as the legacy substream is no longer mandatory, we can consider it as a "hidden thing" that is not supposed to be used.

@tomaka
Copy link
Contributor Author

tomaka commented Apr 21, 2020

To be clear on the idea: the objective is ultimately for the PSM to drive the notification substreams. When a block announce substream is open, we report that peer to the sync state machine. The syncing then performs block requests with block_requests.rs.

The next steps right now are:

@tomaka
Copy link
Contributor Author

tomaka commented Jun 15, 2020

The situation now regarding the legacy substream is:

  • Whenever two nodes connect to each other for the first time, the PSM of each node decides whether the remote peer should be used for Substrate-related purposes (syncing, gossiping, etc., basically everything but Kademlia).

  • In order to avoid race conditions when both sides open a connection to each other at the same time, we allow a second connection between the same two peers to exist. Note however that there is a caveat: the PSM has "in" and "out" slots, and whether an "in" or "out" slot is used depends on who opened a connection first.

  • The PSM only knows about peers, not connections. If two connections with the same peer exist, the PSM decides for both at once.

  • If a PSM decides to use a peer for Substrate-related purposes, it tries to open a "legacy substream" on the connection (if any) with that peer for which it is the dialer.

  • A node never tries to initiate the opening of a "legacy substream" on connections for which it is the listener side.

  • In the past, we would only allow a single legacy substream per connection. The code has since then been changed to accept multiple ones.

  • If a node receives an incoming "legacy substream" on a connection that hasn't been accepted by the PSM, it kills the entire connection.

  • If the PSM decides to stop using a certain peer, all the connections to that peer are entirely closed. This was done because reopening a legacy substream on a connection that already had one could lead to race conditions.

  • When a legacy substream is open, a message is reported to the network worker, which then sends a handshake (a "Status" message) to the remote. Importantly, if multiple legacy substreams are open, we only send the handshake on one of them.

  • While we've been transitionning to more fine-grained substreams, at the moment whether the legacy substream is open and has had a handshake is the main authority on whether a connection is used. If it is open, then we consider all the fine-grained substreams open as well (syncing, grandpa, etc.)

@tomaka
Copy link
Contributor Author

tomaka commented Sep 10, 2020

Quick legacy substream status update:
Nodes still pro-actively open a legacy substream on each connection, but do not send anything on it anymore except for the handshake. Since that same handshake is sent over the block announces substream, and that block announces substream is always opened alongside with the legacy substream, the only thing that remains to do is no longer open the legacy substream and tweak the internal machinery to use that other handshake.

The end result that we want to achieve is #7074 and #7072.
However, given how previous pull requests have proven to be trickier to write than expected and contained some bugs, I'd prefer to go for a progressive approach rather than one giant pull request. In particular, the removal of the legacy substream might lead to some hard-to-predict emergent effects.

As such, the path forward I propose is:

  • Change group.rs to no longer strictly require a legacy substream to be opened by the remote. Instead, special-case the block announces protocol as "the new legacy substream" for handshake-related purposes. Contrary to the actual legacy substream, this only concerns the internal machinery of sc_network. The remote could open either a legacy substream or a block announces substream first.
  • After the previous bullet has been released, change group.rs to no longer pro-actively open a legacy substream. After this bullet, the wire protocol would now be correct.
  • Change group.rs to internally decode the messages received on the legacy substream and convert them to notifications, rather than sending them back undecoded to the NetworkBehaviour. This would hide the legacy substream as an internal detail of group.rs and remove all traces of it from the rest of the code.
  • Do Change sc_network::protocol::generic_proto::behaviour to occupy inbound peerset slots for substreams and not connections #7074, but still open or close all notification substreams together as an undivisible set of protocols.
  • Do Give control over the peers of custom notification protocols #7072, opening and closing each protocol individually.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant