Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(swarm): replace address scoring with explicit candidates #3954

Merged
merged 20 commits into from
May 24, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented May 16, 2023

Description

Previously, a NetworkBehaviour could report an AddressScore for an external address. This score was a u32 and addresses would be ranked amongst those.

In reality, an address is either confirmed to be publicly reachable (via a protocol such as AutoNAT) or merely represents a candidate that might be an external address. In a way, addresses are guilty (private) until proven innocent (publicly reachable).

When a NetworkBehaviour reports an address candidate, we perform address translation on it to potentially correct for ephemeral ports of TCP. These candidates are then injected back into the NetworkBehaviour. Protocols such as AutoNAT can use these addresses as a source for probing their NAT status. Once confirmed, they can emit a ToSwarm::ExternalAddrConfirmed event which again will be passed to all NetworkBehaviours.

This simplified approach will allow us implement Kademlia's client-mode (#2032) without additional configuration options: As soon as an address is reported as publicly reachable, we can activate server-mode for that connection.

Related: #3877.
Related: #3953.
Related: #2032.
Related: libp2p/go-libp2p#2229.

Co-authored-by: Max Inden mail@max-inden.de

Notes & open questions

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@thomaseizinger
Copy link
Contributor Author

For kademlia to be useful, we only want publicly reachable addresses in our routing table. Thus, as a general rule, we will only ever add a remote's address to the routing table if:

  1. we successfully dialed them
  2. the remote advertises the kademlia protocol on this connection:

KademliaHandlerEvent::ProtocolConfirmed { endpoint } => {
debug_assert!(self.connected_peers.contains(&source));
// The remote's address can only be put into the routing table,
// and thus shared with other nodes, if the local node is the dialer,
// since the remote address on an inbound connection may be specific
// to that connection (e.g. typically the TCP port numbers).
let address = match endpoint {
ConnectedPoint::Dialer { address, .. } => Some(address),
ConnectedPoint::Listener { .. } => None,
};
self.connection_updated(source, address, NodeStatus::Connected);
}

(2) is where the client/server mode implementation comes in. A node SHOULD automatically enable/disable advertising the kademlia protocol based on whether it is publicly reachable on an address. For example, the inbound connection could have been hole-punched, meaning the address isn't generally publicly reachable and thus we should not advertise the kademlia protocol there.

How do we figure that out? With protocol like AutoNAT and identify.

To start with, a node only knows about its listen addresses. We inject them into the NetworkBehaviour via FromSwarm::NewListenAddr.

As soon as a connection is established (regardless of inbound or outbound), we run the identify protocol. Identify reports us the address that the other peer observed for us. This is essentially STUN for libp2p.

For example, if we are listening on 192.168.0.13 and are sitting behind a router with the public IP 91.123.03.58 then the other node will report that as the address it is seeing the traffic from.

Using this PR, we can now emit this address via ToSwarm::NewExternalAddrCandidate and have protocols such as AutoNAT test whether we are reachable via this address.

Let's assume for a moment that we are and AutoNAT reports that we are indeed reachable via 91.123.03.58.

Here comes the tricky question: What local interface does this correspond to? If the user specifies a wildcard address for listening, we will start listening on all interfaces available, including loopback. Filtering for private and loopback addresses is not enough though, a machine could easily have multiple NICs and thus multiple different IP addresses.

Currently, we do not expose which local interface an outbound connection corresponds to:

/// Callback that is invoked for every established outbound connection.
///
/// This is invoked once we have successfully dialed a peer.
/// At this point, we have verified their [`PeerId`] and we know, which particular [`Multiaddr`] succeeded in the dial.
/// In order to actually use this connection, this function must return a [`ConnectionHandler`](crate::ConnectionHandler).
/// Returning an error will immediately close the connection.
fn handle_established_outbound_connection(
&mut self,
_connection_id: ConnectionId,
peer: PeerId,
addr: &Multiaddr,
role_override: Endpoint,
) -> Result<THandler<Self>, ConnectionDenied>;

To go back to kademlia's client mode: The only thing we know there is which local interface a connection arrived on. To say with confidence that this interface is publicly reachable, we need a mapping of an external reachable address back to the local interface.

I don't think our AutoNAT implementation currently does this. I think what we'd need to implement in this PR is the following:

  • ToSwarm::NewExternalAddrCandidate as well as ToSwarm::ExternalAddrConfirmed need to carry a tuple of addresses: The address of the local interface and the candidate / confirmed external address.
  • To satisfy this interfaces, identify will need access to the local address of outbound connections (for inbound ones, we already have access to it). Initial testing with TCP showed that we can extract this information from the socket returned from accept.
  • For AutoNAT, we will also have to record the local addr of incoming connections and correlate it with the probe of an external address.

@mxinden Please let me know what you think and whether you see any hole in this reasoning. I believe this is the most correct way to identify external addresses but it will also require some effort in implementing. A shortcut we could take in the meantime is to assume we are publicly reachable on all interfaces if we have at least one externally reachable address.

@mxinden
Copy link
Member

mxinden commented May 17, 2023

Here comes the tricky question: What local interface does this correspond to?

Why do we need to know what the corresponding local interface is?

@thomaseizinger
Copy link
Contributor Author

Here comes the tricky question: What local interface does this correspond to?

Why do we need to know what the corresponding local interface is?

If the machine has multiple NICs and thus multiple local interfaces then the external address only corresponds to one of them, correct?

Say I have two local addresses:

  • An Ethernet connection that has: 10.0.0.15
  • A WiFi connection that has: 192.168.0.17

The Ethernet connection is only connected to a LAN whereas the WiFi connection is routable to the public internet with a forwarded port.

Say I am also running multiple libp2p nodes in my local network and thus have incoming connections on both interfaces.

It would be wrong to advertise kademlia on the Ethernet connection because it would cause those other nodes in my network to include this address in the routing table despite not being globally reachable.

I know this is a constructed example but people might have all sorts of weird network topologies. Fundamentally, I just think it might not be a good idea to assume any incoming connection is publicly routable just because we identified an external address for one of our connections?

@mxinden
Copy link
Member

mxinden commented May 18, 2023

Thanks for expanding, that makes sense.

I suggest treating the multiple-interfaces use-case as an additional feature. I prefer to solve the single-interface use-case first. As far as I can tell the single-interface use-case is a step forward, even when the long term goal would be the multiple-interface use-case.

I am afraid that the multiple-interface use-case is a rabbit-hole which we are unable to solve properly without having a concrete use-case that we are optimizing for.

I just think it might not be a good idea to assume any incoming connection is publicly routable just because we identified an external address for one of our connections?

Meta level, I think what you are referring to here is "any listen address of an incoming connection is publicly routable". A connection itself is never routable, neither publicly nor privately within a subnet.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Next to making client-mode a lot easier, I find this to be a nice clean-up.

swarm/src/behaviour.rs Show resolved Hide resolved
/// - A protocol such as identify obtained it from a remote.
/// - The user provided it based on configuration.
/// - We made an educated guess based on one of our listen addresses.
/// - We established a new relay connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any mechanism to confirm a relayed connection? I am not aware of a libp2p-autonat implementation that would probe relayed addresses. That said, emitting them as NewExternalAddrCandidate makes sense, as we can not confirm them. How would libp2p-identify make sure to include relayed addresses when advertising the local node's listenAddrs (i.e. external addresses)? Would libp2p-identify ignore all FromSwarm::NewExternalAddr addresses except for the relayed ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so but the relay could (should?) emit an event that the relay address is now an external address of ours.

How would libp2p-identify make sure to include relayed addresses when advertising the local node's listenAddrs (i.e. external addresses)? Would libp2p-identify ignore all FromSwarm::NewExternalAddr addresses except for the relayed ones?

Why ignore them? A relayed address is just one more external address, isn't it? It is feasible to use relays to translate between IPv4 and IPv6 for example if node you want to connect to is only on IPv6 and you are only on IPv4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would libp2p-identify ignore all FromSwarm::NewExternalAddr addresses except for the relayed ones?

Why ignore them?

There seems to be a misunderstanding. I suggest libp2p-identify to track all addresses confirmed via ExternalAddrConfirmed. That would not include relayed addresses. Though one still needs to advertise ones relayed addresses via libp2p-identify. I don't think one should advertise all NewExternalAddrCandidate, but instead only the relayed NewExternalAddrCandidates.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can treat a relayed address as externally reachable without verifying it. We did connect to the relay after all so it is kind of implicit that others will be able to connect to the relay too. Thus, I think the relay behaviour could just issue a ToSwarm::NewExternalAddr with the relayed address.

swarm/src/behaviour.rs Show resolved Hide resolved
swarm/src/behaviour.rs Outdated Show resolved Hide resolved
swarm/src/behaviour.rs Outdated Show resolved Hide resolved
swarm/src/lib.rs Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor Author

Thanks for expanding, that makes sense.

I suggest treating the multiple-interfaces use-case as an additional feature. I prefer to solve the single-interface use-case first. As far as I can tell the single-interface use-case is a step forward, even when the long term goal would be the multiple-interface use-case.

Sounds good to me. I'd like to add a log statement to kademlia then where we are making this assumption. Should make debugging easier in case someone runs into that issue.

I just think it might not be a good idea to assume any incoming connection is publicly routable just because we identified an external address for one of our connections?

Meta level, I think what you are referring to here is "any listen address of an incoming connection is publicly routable". A connection itself is never routable, neither publicly nor privately within a subnet.

Yep that makes more sense :D

Co-authored-by: Max Inden <mail@max-inden.de>
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me apart from the above small suggestion.

We should thoroughly test this in combination with #3877. Easiest step I would guess is to deploy it to kademlia-exporter.max-inden.de/.

@thomaseizinger
Copy link
Contributor Author

@mxinden Let me know what you think of 3c38b3f. I didn't see any value in keeping that part of the test because we now have explicit candidates that AutoNAT operates on. I did consider directly flipping to NatStatus::Public on any "confirmed external address" but I am not sure that is a useful feature. I would expect users that use Swarm::add_external_address to not use AutoNAT.

@mxinden
Copy link
Member

mxinden commented May 22, 2023

Additional commits look good to me. Only open discussion is #3954 (comment).

@thomaseizinger
Copy link
Contributor Author

I had to delete two more AutoNAT tests that aren't easily fixable with how we treat external addresses now. If you have an idea on how to fix them, feel free to push a commit.

This reverts commit 62ec482.
@thomaseizinger
Copy link
Contributor Author

@mxinden I restored the tests with the idea we discussed. The AutoNAT behaviour now has a function probe_address that can be used to directly probe an address for external reachability.

@thomaseizinger thomaseizinger added this to the v0.52.0 milestone May 23, 2023
@thomaseizinger
Copy link
Contributor Author

Additional commits look good to me. Only open discussion is #3954 (comment).

The comment is addressed and the deleted tests are restored, thus I am merging this based on the above approval.

@mergify mergify bot merged commit 5e8f2e8 into master May 24, 2023
@mergify mergify bot deleted the 2032/external-addr branch May 24, 2023 07:52
mxinden added a commit to mxinden/rust-libp2p that referenced this pull request Jun 9, 2023
> Observed addresses (aka. external address candidates) of the local node, reported by a remote node
> via `libp2p-identify`, are no longer automatically considered confirmed external addresses, in
> other words they are no longer trusted by default. Instead users need to confirm the reported
> observed address either manually, or by using `libp2p-autonat`. In trusted environments users can
> simply extract observed addresses from a `libp2p-identify::Event::Received { info:
> libp2p_identify::Info { observed_addr }}` and confirm them via `Swarm::add_external_address`.

Follow-up to libp2p#3954.
mergify bot pushed a commit that referenced this pull request Jun 9, 2023
> Observed addresses (aka. external address candidates) of the local node, reported by a remote node
> via `libp2p-identify`, are no longer automatically considered confirmed external addresses, in
> other words they are no longer trusted by default. Instead users need to confirm the reported
> observed address either manually, or by using `libp2p-autonat`. In trusted environments users can
> simply extract observed addresses from a `libp2p-identify::Event::Received { info:
> libp2p_identify::Info { observed_addr }}` and confirm them via `Swarm::add_external_address`.

Follow-up to #3954.

Pull-Request: #4052.
Copy link
Contributor

@divagant-martian divagant-martian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think docs were missed here, let me know if it's me who is misunderstanding something

ExpiredExternalAddr(ExpiredExternalAddr<'a>),
/// Informs the behaviour that we have discovered a new candidate for an external address for us.
NewExternalAddrCandidate(NewExternalAddrCandidate<'a>),
/// Informs the behaviour that an external address of the local node was removed.
Copy link
Contributor

@divagant-martian divagant-martian Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these docs looks wrong to me, or I'm missing something

Comment on lines 558 to +562
/// [`FromSwarm`] variant that informs the behaviour that an external address was removed.
#[derive(Clone, Copy)]
pub struct ExpiredExternalAddr<'a> {
pub struct ExternalAddrConfirmed<'a> {
pub addr: &'a Multiaddr,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these docs correct? They seem to belong to the type below?
I ended up in the pr because docs don't seem to relate to the type name

@thomaseizinger
Copy link
Contributor Author

I think docs were missed here, let me know if it's me who is misunderstanding something

You are right, thanks for spotting! I'll see to fix it up later :)

/// Informs the behaviour that an external address was removed.
ExpiredExternalAddr(ExpiredExternalAddr<'a>),
/// Informs the behaviour that we have discovered a new candidate for an external address for us.
NewExternalAddrCandidate(NewExternalAddrCandidate<'a>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the events below miss their corresponding copies in SwarmEvent, correct @thomaseizinger? Or was it a deliberate choice to not report these events to the user as well?

Related to #4688.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't a deliberate choice to omit them but I have to admit I also didn't know that we were activel trying to keep these in sync.

umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this pull request Mar 8, 2024
Previously, a `NetworkBehaviour` could report an `AddressScore` for an external address. This score was a `u32` and addresses would be ranked amongst those.

In reality, an address is either confirmed to be publicly reachable (via a protocol such as AutoNAT) or merely represents a candidate that might be an external address. In a way, addresses are guilty (private) until proven innocent (publicly reachable).

When a `NetworkBehaviour` reports an address candidate, we perform address translation on it to potentially correct for ephemeral ports of TCP. These candidates are then injected back into the `NetworkBehaviour`. Protocols such as AutoNAT can use these addresses as a source for probing their NAT status. Once confirmed, they can emit a `ToSwarm::ExternalAddrConfirmed` event which again will be passed to all `NetworkBehaviour`s.

This simplified approach will allow us implement Kademlia's client-mode (libp2p#2032) without additional configuration options: As soon as an address is reported as publicly reachable, we can activate server-mode for that connection.

Related: libp2p#3877.
Related: libp2p#3953.
Related: libp2p#2032.
Related: libp2p/go-libp2p#2229.

Co-authored-by: Max Inden <mail@max-inden.de>

Pull-Request: libp2p#3954.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants