-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add max number of elements to non-prio queue #1077
Conversation
@arnetheduck how about replacing https://github.com/vacp2p/nim-libp2p/pull/1077/files#diff-abbf987f19a24c9940cd01bde79a1b1e0819f1c3b6d7ba2efa04ad680d022c78R182-R186 by |
@@ -181,6 +182,24 @@ proc handle*(p: PubSubPeer, conn: Connection) {.async.} = | |||
debug "exiting pubsub read loop", | |||
conn, peer = p, closed = conn.closed | |||
|
|||
proc disconnect(p: PubSubPeer) {.async.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't disconnect the peer - it just closes the send connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't disconnect the peer from PubSubPeer
, it's available only in GossipSub and I don't have a reference to it here cause of the circular dependency limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we emit a Disconnected
event that doesn't mean a disconnection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I goit it right.
- We reset the stream and close the underlying connection here, but don't raise any error so at this time no error happens
nim-libp2p/libp2p/muxers/mplex/lpchannel.nim
Line 220 in 2860959
await s.conn.close() - Next time we try to write to the stream this error happens
debug "Exception occurred in PubSubPeer.handle", - Then this is executed
await p.disconnect()
So the only difference now is that the underlying connection hasn't been closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only difference now is that the underlying connection hasn't been closed?
yes - closing the underlying ensures that all ongoing concurrent reads, writes etc are aborted and we get rid of the peer - this is important because we don't know what state gossipsub is in if we just abort here - they might have received control messages or not so closing the connection entirely is one of the safest and least bad things you can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could rename PubSubPeerEventKind.Disconnected
to PubSubPeerEventKind.StreamClosed
and create a PubSubPeerEventKind.Disconnect
. Then GossipSub
would disconnect the peer when receiving the latter. This would be the closest to the current behavior, I guess. Another option would be to disconnect the peer here
for topic, peers in p.mesh.mpairs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how can I close the underlying connection here?
see suggested send-all-non-prio-messages trick - it should work because if there's at least 1 in-flight future, it will cause mplex to do the disconnection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't emit an event that tells GossipSub
to disconnect the peer be simpler and easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know - that might introduce new async ordering problems - I'm not necessarily against it, just that it is a larger and more invasive change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think just writing the whole queue might not trigger the disconnection if the futures complete fast enough.
let's start with peer disconnection (to maintain current behavior) - when we reach a stable point with that working, we can consider other strategies (so that we can get a release out). |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1077 +/- ##
============================================
- Coverage 84.83% 84.79% -0.04%
============================================
Files 91 91
Lines 15405 15420 +15
============================================
+ Hits 13069 13076 +7
- Misses 2336 2344 +8
|
b5db38e
to
484c25d
Compare
libp2p_gossipsub_non_priority_queue_size.inc(labelValues = [$p.peerId]) | ||
f | ||
if len(p.rpcmessagequeue.nonPriorityQueue) == p.maxNumElementInNonPriorityQueue: | ||
if not p.disconnected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this flag is set, it seems prudent to check in connectOnce
that we don't re-attempt connection, ie DisconnectionRequested
should lead to the send conn never being opened again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I saw that peers kept being disconnected multiple times and that was probably the reason. But I had to add the check in connectImpl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the non-prio queue should also be cleared (ie all sends in it should be failed) at this point since it will no longer be sent - and if there are any new sends after the disconnection, they should fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, it maybe makes sense to balance the streamOpened event with a streamclosed in this case, such events typically benefit from being neutralised this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Canceling the futures will cause this problem again #1075, won't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe makes sense to balance the streamOpened event with a streamclosed in this case
Could you please elaborate more on what you mean by that? Should I also emit a Streamclosed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Canceling the futures will cause this problem again #1075, won't it?
Not sure if you meant canceling the futures. stopSendNonPriorityTask
clears the queue and it's called in unsubscribePeer
which should be called when the peer is disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you meant canceling the futures.
I meant failing them, not cancelling, but now I realise we always return a completed future instead of storing pending futures in the queue, so there's nothing to fail:
https://github.com/vacp2p/nim-libp2p/blob/unstable/libp2p/protocols/pubsub/pubsubpeer.nim#L341
addLast
always returns a completed future when the async queue is unbounded (as it is in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also emit a Streamclosed?
this is an idea for a future improvement - it's maybe not needed right now, however
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's basically a lifecycle here:
created -> [stream opened -> stream closed]* -> disconnected
where the middle two states repeat, so when designing an event api around it, these are the relevant events.
59a72cc
to
16a9de8
Compare
16a9de8
to
5c63dad
Compare
This PR adds a default limit to the number of elements in the non-priority queue. When this limit has been reached, the peer will be disconnected. The default value is
1024
but can also be configured through a new GossipSub parammaxNumElementsInNonPriorityQueue
.