-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
address_store: Improve address tracking and add eviction algorithm #250
Conversation
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
src/transport/manager/address.rs
Outdated
/// Score for a connection with a peer using a different ID than expected. | ||
pub const DIFFERENT_PEER_ID: i32 = 50i32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realistically we can't establish a connection to this peer, so I would make the score the same as CONNECTION_FAILURE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep makes sense! Have simplified the code kept around 1 connection failure and 1 address failure which punishes more, thanks for the suggestion! 🙏
src/transport/manager/handle.rs
Outdated
continue; | ||
}; | ||
|
||
// This can correspond to the provided peerID or to a different one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would generate an error if we try to add a peer
's address with a different peer_id
, because likely what is happening is not what the user / client code wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added an warning here, good point 🙏
src/transport/manager/handle.rs
Outdated
|
||
return 0usize; | ||
// Add the provided peer ID to the address. | ||
let address = address.with(Protocol::P2p(multihash::Multihash::from(peer.clone()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this create an invalid address if the address already contains the peer id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have refactored the continue
with an if else
branch, looked a bit confusing indeed.
I think we check that the last element is protocol::p2p, let me know if this is sounds right
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
src/transport/manager/address.rs
Outdated
// that already failed (with negative score). | ||
if num_addresses >= self.max_capacity { | ||
let Some(min_record) = self.addresses.values().min().cloned() else { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we lose the address due to returning if the values()
are empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The num_addresses
is initialized to self.addresses.len()
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have removed the num_addreses
variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance of getting None
here? May be expect()
with a comment then?
src/transport/manager/handle.rs
Outdated
return 0usize; | ||
// It is important to keep track of all addresses to have a healthy | ||
// address store to dial from. | ||
peer_addresses.entry(peer_id).or_insert_with(HashSet::new).insert(address); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not discard the address with invalid peer ID? If the address with invalid peer ID ended up here, something bad has already happened. In theory, we could rewrite the peer ID part, but ideally we should have returned an error to the user long before coming here with invalid peer ID.
This would also simplify the code below by inserting only one peer in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oki doki that makes sense. I think I'm convinced that tracking all peer addresses is not the best thing here, since the user explicitly provided addresses of a single "peer". I've discarded addresses that belong to a different peer,this should make the code a bit easier to follow, thanks for the suggestion 🙏
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…o lexnv/enhanve-address-store
src/transport/manager/address.rs
Outdated
// that already failed (with negative score). | ||
if num_addresses >= self.max_capacity { | ||
let Some(min_record) = self.addresses.values().min().cloned() else { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance of getting None
here? May be expect()
with a comment then?
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
This PR refactors the peer state. Previously the `AddressStore` and `PeerState` where intertwined. The `AddressStore` contained an optional `ConnectionID`, while the `PeerState` states contained a mandatory score. This PR separates the address store and the peer state, while moving the logic of the peer state transitioning to a separate module. While at it, have added documentation and testing for state machine transitions. Changes include: - Secondary connection from the PeerContext is merged into the `PeerState` - Connection Ids are removed from the Address Store - Transport manager is refactored to use the new state machine, this keeps a minimal code around in the manager ### Testing Done - added extra tests to the peer state - tested with subp2p-explorer for discovering kusama This builds upon #250 for a bigger refactor effort to track addresses in a healthier Aims at improving: - #239 - #238 - #237 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Dmitry Markin <dmitry@markin.tech>
## [0.8.0] - 2024-11-01 This release adds support for content provider advertisement and discovery to Kademlia protocol implementation (see libp2p [spec](https://github.com/libp2p/specs/blob/master/kad-dht/README.md#content-provider-advertisement-and-discovery)). Additionally, the release includes several improvements and memory leak fixes to enhance the stability and performance of the litep2p library. ### Added - kad: Providers part 8: unit, e2e, and `libp2p` conformance tests ([#258](#258)) - kad: Providers part 7: better types and public API, public addresses & known providers ([#246](#246)) - kad: Providers part 6: stop providing ([#245](#245)) - kad: Providers part 5: `GET_PROVIDERS` query ([#236](#236)) - kad: Providers part 4: refresh local providers ([#235](#235)) - kad: Providers part 3: publish provider records (start providing) ([#234](#234)) ### Changed - transport_service: Improve connection stability by downgrading connections on substream inactivity ([#260](#260)) - transport: Abort canceled dial attempts for TCP, WebSocket and Quic ([#255](#255)) - kad/executor: Add timeout for writting frames ([#277](#277)) - kad: Avoid cloning the `KademliaMessage` and use reference for `RoutingTable::closest` ([#233](#233)) - peer_state: Robust state machine transitions ([#251](#251)) - address_store: Improve address tracking and add eviction algorithm ([#250](#250)) - kad: Remove unused serde cfg ([#262](#262)) - req-resp: Refactor to move functionality to dedicated methods ([#244](#244)) - transport_service: Improve logs and move code from tokio::select macro ([#254](#254)) ### Fixed - tcp/websocket/quic: Fix cancel memory leak ([#272](#272)) - transport: Fix pending dials memory leak ([#271](#271)) - ping: Fix memory leak of unremoved `pending_opens` ([#274](#274)) - identify: Fix memory leak of unused `pending_opens` ([#273](#273)) - kad: Fix not retrieving local records ([#221](#221)) --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Dmitry Markin <dmitry@markin.tech>
This PR updates litep2p to the latest release. - `KademliaEvent::PutRecordSucess` is renamed to fix word typo - `KademliaEvent::GetProvidersSuccess` and `KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work and will be utilized later ### Added - kad: Providers part 8: unit, e2e, and `libp2p` conformance tests ([#258](paritytech/litep2p#258)) - kad: Providers part 7: better types and public API, public addresses & known providers ([#246](paritytech/litep2p#246)) - kad: Providers part 6: stop providing ([#245](paritytech/litep2p#245)) - kad: Providers part 5: `GET_PROVIDERS` query ([#236](paritytech/litep2p#236)) - kad: Providers part 4: refresh local providers ([#235](paritytech/litep2p#235)) - kad: Providers part 3: publish provider records (start providing) ([#234](paritytech/litep2p#234)) ### Changed - transport_service: Improve connection stability by downgrading connections on substream inactivity ([#260](paritytech/litep2p#260)) - transport: Abort canceled dial attempts for TCP, WebSocket and Quic ([#255](paritytech/litep2p#255)) - kad/executor: Add timeout for writting frames ([#277](paritytech/litep2p#277)) - kad: Avoid cloning the `KademliaMessage` and use reference for `RoutingTable::closest` ([#233](paritytech/litep2p#233)) - peer_state: Robust state machine transitions ([#251](paritytech/litep2p#251)) - address_store: Improve address tracking and add eviction algorithm ([#250](paritytech/litep2p#250)) - kad: Remove unused serde cfg ([#262](paritytech/litep2p#262)) - req-resp: Refactor to move functionality to dedicated methods ([#244](paritytech/litep2p#244)) - transport_service: Improve logs and move code from tokio::select macro ([#254](paritytech/litep2p#254)) ### Fixed - tcp/websocket/quic: Fix cancel memory leak ([#272](paritytech/litep2p#272)) - transport: Fix pending dials memory leak ([#271](paritytech/litep2p#271)) - ping: Fix memory leak of unremoved `pending_opens` ([#274](paritytech/litep2p#274)) - identify: Fix memory leak of unused `pending_opens` ([#273](paritytech/litep2p#273)) - kad: Fix not retrieving local records ([#221](paritytech/litep2p#221)) See release changelog for more details: https://github.com/paritytech/litep2p/releases/tag/v0.8.0 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Dmitry Markin <dmitry@markin.tech>
This PR updates litep2p to the latest release. - `KademliaEvent::PutRecordSucess` is renamed to fix word typo - `KademliaEvent::GetProvidersSuccess` and `KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work and will be utilized later - kad: Providers part 8: unit, e2e, and `libp2p` conformance tests ([#258](paritytech/litep2p#258)) - kad: Providers part 7: better types and public API, public addresses & known providers ([#246](paritytech/litep2p#246)) - kad: Providers part 6: stop providing ([#245](paritytech/litep2p#245)) - kad: Providers part 5: `GET_PROVIDERS` query ([#236](paritytech/litep2p#236)) - kad: Providers part 4: refresh local providers ([#235](paritytech/litep2p#235)) - kad: Providers part 3: publish provider records (start providing) ([#234](paritytech/litep2p#234)) - transport_service: Improve connection stability by downgrading connections on substream inactivity ([#260](paritytech/litep2p#260)) - transport: Abort canceled dial attempts for TCP, WebSocket and Quic ([#255](paritytech/litep2p#255)) - kad/executor: Add timeout for writting frames ([#277](paritytech/litep2p#277)) - kad: Avoid cloning the `KademliaMessage` and use reference for `RoutingTable::closest` ([#233](paritytech/litep2p#233)) - peer_state: Robust state machine transitions ([#251](paritytech/litep2p#251)) - address_store: Improve address tracking and add eviction algorithm ([#250](paritytech/litep2p#250)) - kad: Remove unused serde cfg ([#262](paritytech/litep2p#262)) - req-resp: Refactor to move functionality to dedicated methods ([#244](paritytech/litep2p#244)) - transport_service: Improve logs and move code from tokio::select macro ([#254](paritytech/litep2p#254)) - tcp/websocket/quic: Fix cancel memory leak ([#272](paritytech/litep2p#272)) - transport: Fix pending dials memory leak ([#271](paritytech/litep2p#271)) - ping: Fix memory leak of unremoved `pending_opens` ([#274](paritytech/litep2p#274)) - identify: Fix memory leak of unused `pending_opens` ([#273](paritytech/litep2p#273)) - kad: Fix not retrieving local records ([#221](paritytech/litep2p#221)) See release changelog for more details: https://github.com/paritytech/litep2p/releases/tag/v0.8.0 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Dmitry Markin <dmitry@markin.tech> Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
This PR improves the transport manager's address tracking to keep a healthier view of addresses.
General changes:
listener
addresses were tracked instead ofdialing
addresses (incoming connections may be established with ephemeral ports and there's no guarantee the remote is listening on them)PeerIdMismatch
error coming from the noise handshake redirects the address to the appropriate peer for healthier addressesAddress Store changes:
Testing Done
This PR is part of a bigger refactor to keep track of healthier addresses.
It is essentially a breakdown of #248 for an easier review process.