Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(relay): don't close connections upon errors in relay server #4718

Merged
merged 17 commits into from
Nov 1, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Oct 24, 2023

Description

To remove the usages of ConnectionHandlerEvent::Close from the relay-server, we unify what used to be called CircuitFailedReason and FatalUpgradeError. Whilst the errors may be fatal for the particular circuit, they are not necessarily fatal for the entire connection.

Related: #3591.
Resolves: #4716.

Notes & open questions

  • Should we do some kind of "smart" connection management upon failures on the streams further up? At the moment, we don't expose the details of which connection a stream failed on. I am leaning towards saying "no" here and instead relying more on fix(swarm): keep connections alive while active streams exist #4595. Once we do more automated keep-alive tracking, bad connections will close automatically much more aggressively. That is because any error on a stream will lead to the user dropping the stream which means we will automatically return KeepAlive::No.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@mergify

This comment was marked as resolved.

@mergify

This comment was marked as resolved.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work.

Can you add a test that ensures #4752 is fixed with this pull request, i.e. that a connection is not closed even though the remote does not support the relay protocol as one expected?

protocols/relay/src/behaviour.rs Show resolved Hide resolved
protocols/relay/src/behaviour/handler.rs Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor Author

Can you add a test that ensures #4752 is fixed with this pull request, i.e. that a connection is not closed even though the remote does not support the relay protocol as one expected?

To properly test this, I think we need to also merge the fix for the client side (#4745) otherwise, the client will simply close the connection. Happy to add a test once both PRs are landed.

@thomaseizinger thomaseizinger added the internal-change Pull requests that make internal changes to crates and thus don't need to include a changelog entry. label Oct 29, 2023
@thomaseizinger thomaseizinger force-pushed the feat/remove-connection-close-relayed branch from 14ee70d to 34c8f71 Compare October 29, 2023 23:12
@mergify
Copy link
Contributor

mergify bot commented Oct 30, 2023

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

mergify bot pushed a commit that referenced this pull request Oct 31, 2023
To make a reservation with a relay, a user calls `Swarm::listen_on` with an address of the relay, suffixed with a `/p2pcircuit` protocol. Similarly, to establish a circuit to another peer, a user needs to call `Swarm::dial` with such an address. Upon success, the `Swarm` then issues a `SwarmEvent::NewListenAddr` event in case of a successful reservation or a `SwarmEvent::ConnectionEstablished` in case of a successful connect.

The story is different for errors. Somewhat counterintuitively, the actual reason of an error during these operations are only reported as `relay::Event`s without a direct correlation to the user's `Swarm::listen_on` or `Swarm::dial` calls.

With this PR, we send these errors back "into" the `Transport` and report them as `SwarmEvent::ListenerClosed` or `SwarmEvent::OutgoingConnectionError`. This is conceptually more correct. Additionally, by sending these errors back to the transport, we no longer use `ConnectionHandlerEvent::Close` which entirely closes the underlying relay connection. In case the connection is not used for something else, it will be closed by the keep-alive algorithm.

Resolves: #4717.
Related: #3591.
Related: #4718.

Pull-Request: #4745.
@mergify mergify bot merged commit 823d0b2 into master Nov 1, 2023
71 checks passed
@mergify mergify bot deleted the feat/remove-connection-close-relayed branch November 1, 2023 01:51
mergify bot pushed a commit that referenced this pull request Nov 2, 2023
This PR implements the long-awaited design of disallowing `ConnectionHandler`s to close entire connections. Instead, users should close connections via `ToSwarm::CloseConnection` from a `NetworkBehaviour` or - even better - from the `Swarm` via `close_connection`. A `NetworkBehaviour` also does not have a "full" view onto how a connection is used but at least it can correlate whether it created the connection via the `ConnectionId`. In general, the more modular and friendly approach is to stop "using" a connection if a particular protocol no longer needs it. As a result of the keep-alive algorithm, such a connection is then closed automatically.

Depends-on: #4745.
Depends-on: #4718.
Depends-on: #4749.
Related: #3353.
Related: #4714.
Resolves: #3591.

Pull-Request: #4755.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal-change Pull requests that make internal changes to crates and thus don't need to include a changelog entry. send-it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove usage of ConnectionHandlerEvent::Close from libp2p_relay
2 participants