Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For MEETs, save the extensions support flag immediately during MEET processing #778

Merged
merged 7 commits into from
Sep 10, 2024

Conversation

bentotten
Copy link
Contributor

@bentotten bentotten commented Jul 12, 2024

For backwards compatibility reasons, a node will wait until it receives a cluster message with the extensions flag before sending its own extensions. This leads to a delay in shard ID propagation that can corrupt nodes.conf with inaccurate shard IDs if a node is restarted before this can stabilize.

This fixes much of that delay by immediately triggering the extensions-supported flag during the MEET processing and attaching the node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes #774

… MEET processing

    For backwards compatibility reasons, a node will wait until it receives
    a cluster message with the open source extensions flag before sending
    its own open source extensions. This leads to a delay in shard ID propogation
    that can corrupt nodes.conf with inaccurate shard IDs if a node is restarted
    before this can stabilize.
    This fixes most of that delay by immediately triggering the extensions-supported flag
    during the MEET processing, allowing the PONG reply to contain OSS extensions and
    to be marked as compatible by the sender.

Signed-off-by: Ben Totten <btotten@amazon.com>
@bentotten
Copy link
Contributor Author

(force pushed with DCO)

Copy link

codecov bot commented Jul 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.33%. Comparing base (9948f07) to head (6699506).
Report is 127 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #778      +/-   ##
============================================
+ Coverage     70.23%   70.33%   +0.09%     
============================================
  Files           112      112              
  Lines         60602    61278     +676     
============================================
+ Hits          42566    43099     +533     
- Misses        18036    18179     +143     
Files with missing lines Coverage Δ
src/cluster_legacy.c 85.84% <100.00%> (+0.05%) ⬆️

... and 28 files with indirect coverage changes

Ben Totten added 3 commits July 12, 2024 13:13
Signed-off-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ben Totten <btotten@amazon.com>
@enjoy-binbin enjoy-binbin changed the title For MEETs, save the extensions support flag immediately during… For MEETs, save the extensions support flag immediately during MEET processing Jul 12, 2024
Signed-off-by: Ben Totten <btotten@amazon.com>
Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bentotten. LGTM.

@hpatro hpatro requested review from PingXie and madolson July 12, 2024 17:40
Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

src/cluster_legacy.c Show resolved Hide resolved
Signed-off-by: Ben Totten <btotten@amazon.com>
src/cluster_legacy.c Outdated Show resolved Hide resolved
Signed-off-by: Ben Totten <btotten@amazon.com>
Copy link
Contributor Author

@bentotten bentotten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, if I am understanding this correctly, adding the node name here did seem to cause a problem for the next packet exchanges - it looks like adding the name here during the MEET causes sender to be valid during the next packet received, which hits this logic and frees the node/link before the handshake flag can be removed

                if (sender) {
                    serverLog(LL_VERBOSE,
                        "Handshake: we already know node %.40s (%s), "
                        "updating the address if needed.", sender->name, sender->human_nodename);
                    if (nodeUpdateAddressIfNeeded(sender,link,hdr))
                    {
                        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
                                             CLUSTER_TODO_UPDATE_STATE);
                    }
                    /* Free this node as we already have it. This will
                     * cause the link to be freed as well. */
                    clusterDelNode(link->node);
                    return 0;

Follow up question:
Do we want to finish the handshake during MEET processing (maybe after sending the PONG) or leave the name as-is and keep the handshake completion for the ping/pong after the MEET?

src/cluster_legacy.c Show resolved Hide resolved
@madolson madolson added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 24, 2024
@PingXie
Copy link
Member

PingXie commented Aug 12, 2024

Sorry for dropping the ball on this one, @bentotten. Let me find some time next and get back to this thread.

@bentotten
Copy link
Contributor Author

any update on this?

@PingXie
Copy link
Member

PingXie commented Sep 2, 2024

Thanks for your patience, @bentotten! I think this change makes sense overall. I think we should also consider updating the extension_capable flag right after completing the handshake as below:

clusterRenameNode(link->node, hdr->sender);

otherwise, I think we will have to wait for the next PONG message to fix the flag at:

if (sender && (hdr->mflags[0] & CLUSTERMSG_FLAG0_EXT_DATA)) {

it looks like adding the name here during the MEET causes sender to be valid during the next packet received, which hits this logic and frees the node/link before the handshake flag can be removed

Yeah, you are right. Setting the name during MEET alters the handshake sequence. I agree we should not make this change.

Along the same line, I'm now wondering whether we should connect the link and node during the MEET phase (via setClusterNodeToInboundClusterLink). Currently, we establish these "links" (between link and node) after the handshake is completed. Based on the existing implementation, I don't anticipate any issues with moving this logic earlier into the MEET phase. However, I would be a lot more comfortable if we could find a way to instruct clusterSendPing to explicitly set this extension-capable flag without relying on link->node.

@madolson
Copy link
Member

Based on some offline discussion, we're going to merge this to see if it helps with another issue and I will create an issue with Ping's comment to followup with.

@madolson madolson merged commit affbea5 into valkey-io:unstable Sep 10, 2024
20 checks passed
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…rocessing (valkey-io#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes valkey-io#774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…rocessing (valkey-io#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes valkey-io#774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…rocessing (valkey-io#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes valkey-io#774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
enjoy-binbin pushed a commit to enjoy-binbin/valkey that referenced this pull request Sep 14, 2024
…rocessing (valkey-io#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes valkey-io#774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…rocessing (valkey-io#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes valkey-io#774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…rocessing (valkey-io#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes valkey-io#774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
PingXie pushed a commit that referenced this pull request Sep 15, 2024
…rocessing (#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes #774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
naglera pushed a commit to naglera/placeholderkv that referenced this pull request Oct 10, 2024
…rocessing (valkey-io#778)

For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes valkey-io#774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Signed-off-by: naglera <anagler123@gmail.com>
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Dec 13, 2024
After valkey-io#1307 got merged, we notice there is a assert happen in setClusterNodeToInboundClusterLink:
```
=== ASSERTION FAILED ===
==> '!link->node' is not true
```

In valkey-io#778, we will call setClusterNodeToInboundClusterLink to attach the node to the link
during the MEET processing, so if we receive a another MEET packet in a short time, the
node is still in handshake state, we will meet this assert and crash the server.

If the link is bound to a node and the node is in the handshake state, and we receive
a MEET packet, it may be that the sender sent multiple MEET packets when reconnecting,
and in here we are dropping the MEET. Note that in getNodeFromLinkAndMsg, the node in
the handshake state has a random name and not truly "known", so we don't know the sender.
Dropping the MEET packet can prevent us from creating a random node, avoid incorrect
link binding, and avoid duplicate MEET packet eliminate the handshake state.

Signed-off-by: Binbin <binloveplay1314@qq.com>
enjoy-binbin added a commit that referenced this pull request Dec 16, 2024
After #1307 got merged, we notice there is a assert happen in setClusterNodeToInboundClusterLink:
```
=== ASSERTION FAILED ===
==> '!link->node' is not true
```

In #778, we will call setClusterNodeToInboundClusterLink to attach the node to the link
during the MEET processing, so if we receive a another MEET packet in a short time, the
node is still in handshake state, we will meet this assert and crash the server.

If the link is bound to a node and the node is in the handshake state, and we receive
a MEET packet, it may be that the sender sent multiple MEET packets so in here we are
dropping the MEET to avoid the assert in setClusterNodeToInboundClusterLink. The assert
will happen if the other sends a MEET packet because it detects that there is no inbound
link, this node creates a new node in HANDSHAKE state (with a random node name), and
respond with a PONG. The other node receives the PONG and removes the CLUSTER_NODE_MEET
flag. This node is supposed to open an outbound connection to the other node in the next
cron cycle, but before this happens, the other node re-sends a MEET on the same link
because it still detects no inbound connection.

Note that in getNodeFromLinkAndMsg, the node in the handshake state has a random name
and not truly "known", so we don't know the sender. Dropping the MEET packet can prevent
us from creating a random node, avoid incorrect link binding, and avoid duplicate MEET
packet eliminate the handshake state.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] nodes.conf can be corrupted when node is restarted before cluster shard ID stabilizes (for 7.2)
5 participants