-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send MEET packet to node if there is no inbound link to fix inconsistency when handshake timedout #1307
Conversation
1920952
to
d65423e
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1307 +/- ##
============================================
+ Coverage 70.62% 70.90% +0.28%
============================================
Files 117 119 +2
Lines 63324 64631 +1307
============================================
+ Hits 44722 45828 +1106
- Misses 18602 18803 +201
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we disconnect the outbound link if inbound link is not available? I think it will lead to the same reconnection flow. Would it help with having simpler code and one unified flow. I'm not sure if it will perform the MEET
operation though.
In this case we would just re-open an outbound connection, which the other node will accept, but it won't force the other node to recognize us as being part of the cluster if it doesn't trust us yet. The only way to force the other node to add us to its cluster view is for us to send a MEET packet. |
CLUSTER MEET is an admin operation but I guess we are fine with the case of reinitiating it if the operation wasn't successful in first place and retry it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do something like #461? only clear the CLUSTER_NODE_MEET flag when myself receive a "ack" (not the plain PONG but something with a strong ack, ack that sender has already meet myself?) I haven't thought about it carefully, but i feel it is more reliable?
Do you mean by like adding a new flag? I think the concern is we could still end up in the inverse state, where the the node that received the "strong" ack will put the other node online but then might go offline. My original thought was that as long as one node believes the other is part of the cluster, is should try to have the other node join. It's sort of like an "enhanced" version of how we built up the mesh when two disjoin clusters meet each other. |
We could do a 3-way handshake to strengthen the handshake reliability, instead of the current -> MEET/PONG, <- PING/PONG. We could add an extra back and forth between the two nodes before clearing the flags. But we can always end up in a situation where one node thinks the handshake is done and the other node times out the handshake because the last packet got delayed or dropped. With this solution, if the handshake has succeeded on one side and not the other, we ensure both sides will eventually know each other. |
With this fix, we can sometime get in a (potentially) infinite loop where a node keeps sending a MEET packet to the other node, but both nodes know each other. This sometimes (although rarely) happens in the second test The following sequence of events can trigger this issue:
Node 0 should eventually send a PING packet to node 1, but there is no guarantee as to when that can happen. When I reproduce the issue, node 0 never gets a chance to send a PING to node 1 because node 0 overrides Line 5045 in 33f42d7
I think there are various ways to mitigate this issue:
I think option 3 should work best without making too many changes to the current logic. But I'm open to suggestions. |
68324ec
to
f6eae88
Compare
|
#1368 the expire test is fixed |
In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa). To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it. Signed-off-by: Pierre Turin <pieturin@amazon.com>
Signed-off-by: Pierre Turin <pieturin@amazon.com>
Update test to check node IDs instead of relying on number of words. Rename nodeIsMeeting() to nodeInMeetState(). Introduce nodeInNormalState() macro. Signed-off-by: Pierre Turin <pieturin@amazon.com>
If we receive a MEET packet from a known node, we disconnect the outbound link to force a reconnect and sending of a PING packet so that the other node recognizes the link as belonging to us. Also deflaked one of the tests. And improved testing code following PR comment. Signed-off-by: Pierre Turin <pieturin@amazon.com>
Signed-off-by: Pierre Turin <pieturin@amazon.com>
2e8c6f1
to
7ae3b05
Compare
Signed-off-by: Pierre Turin <pieturin@amazon.com>
Sometimes the outbound link from node 0 to node 1 can be disconnected. Assert that node 0 know node 1 without expecting the node to be marked as connected. Signed-off-by: Pierre Turin <pieturin@amazon.com>
3b5aa41
to
6a017b0
Compare
@pieturin Could you update the top comment as well about the exact behavior change? |
Signed-off-by: Pierre Turin <pieturin@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But I would like someone else as well to look at it before merging this in.
Updated the PR description. |
@PingXie, @madolson, @enjoy-binbin, any comments on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Top comment and code LGTM, did not review the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the comment isn't really a blocker for me, it's just about documentation.
Signed-off-by: Pierre Turin <pieturin@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to send out my (partial) review :)
Also I don't think I fully understand the problem that we are trying to solve and why the fix works.
Do I understand correctly that we are trying to solve two problems here?
More specifically, a new node that hasn't gone through the full handshake process can still get gossiped around the cluster and then picked up by existing nodes that are not involved in the handshake process?
If
The fix for the first problem makes sense to me: resending I am not sure how/if the second problem is addressed by this PR? if both |
If the handshake times out on both side of the meet, then, this fix won't do anything. This fix is only meant to make sure we never end up with an inconsistent view of the cluster (where one side knows the other nodes, but not the other side). This makes the meet handshake more reliable but not fool proof. The inconsistent view of the cluster can happen in two different ways (that I could see):
We don't gossip a node if it's in HANDSHAKE state. See: Lines 4039 to 4048 in 5f7fe9e
I'll prepare a follow-up PR to address your comments @PingXie, thanks for the review! |
After valkey-io#1307 got merged, we notice there is a assert happen in setClusterNodeToInboundClusterLink: ``` === ASSERTION FAILED === ==> '!link->node' is not true ``` In valkey-io#778, we will call setClusterNodeToInboundClusterLink to attach the node to the link during the MEET processing, so if we receive a another MEET packet in a short time, the node is still in handshake state, we will meet this assert and crash the server. If the link is bound to a node and the node is in the handshake state, and we receive a MEET packet, it may be that the sender sent multiple MEET packets when reconnecting, and in here we are dropping the MEET. Note that in getNodeFromLinkAndMsg, the node in the handshake state has a random name and not truly "known", so we don't know the sender. Dropping the MEET packet can prevent us from creating a random node, avoid incorrect link binding, and avoid duplicate MEET packet eliminate the handshake state. Signed-off-by: Binbin <binloveplay1314@qq.com>
…ency when handshake timedout (valkey-io#1307) In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa). To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it. If we receive a MEET packet from a known node, we disconnect the outbound link to force a reconnect and sending of a PING packet so that the other node recognizes the link as belonging to us. This prevents cases where a node could send MEET packets in a loop because it thinks the other node does not have an inbound link. This fixes the bug described in valkey-io#1251. --------- Signed-off-by: Pierre Turin <pieturin@amazon.com>
After #1307 got merged, we notice there is a assert happen in setClusterNodeToInboundClusterLink: ``` === ASSERTION FAILED === ==> '!link->node' is not true ``` In #778, we will call setClusterNodeToInboundClusterLink to attach the node to the link during the MEET processing, so if we receive a another MEET packet in a short time, the node is still in handshake state, we will meet this assert and crash the server. If the link is bound to a node and the node is in the handshake state, and we receive a MEET packet, it may be that the sender sent multiple MEET packets so in here we are dropping the MEET to avoid the assert in setClusterNodeToInboundClusterLink. The assert will happen if the other sends a MEET packet because it detects that there is no inbound link, this node creates a new node in HANDSHAKE state (with a random node name), and respond with a PONG. The other node receives the PONG and removes the CLUSTER_NODE_MEET flag. This node is supposed to open an outbound connection to the other node in the next cron cycle, but before this happens, the other node re-sends a MEET on the same link because it still detects no inbound connection. Note that in getNodeFromLinkAndMsg, the node in the handshake state has a random name and not truly "known", so we don't know the sender. Dropping the MEET packet can prevent us from creating a random node, avoid incorrect link binding, and avoid duplicate MEET packet eliminate the handshake state. Signed-off-by: Binbin <binloveplay1314@qq.com>
make sense
Right but in this case
Thanks. Will TAL. |
Ah yes, you're right, this is what should happen:
This is correct that |
In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa).
To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it.
If we receive a MEET packet from a known node, we disconnect the outbound link to force a reconnect and sending of a PING packet so that the other node recognizes the link as belonging to us. This prevents cases where a node could send MEET packets in a loop because it thinks the other node does not have an inbound link.
This fixes the bug described in #1251.