-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] CLIENT CAPA redirect
may result in bouncing redirects during failover
#821
Comments
thanks for the report, @soloestoy please also take a look. |
This is an interesting scenario, and the initial solution seems to be able to maintain the "client pause write" state until |
The redirect for CME is based on whether the node owns the slot, not whether the node is a primary. So, there shouldn't be a loop since one of the node always owns the slot.
This generally sounds about right. We should be entering a pause situation on the primary instead of redirecting duing the failover. |
Just to avoid a misunderstanding (I may not have made this clear enough above): During standalone FAILOVER, writing clients will be paused during the entire failover and paused clients are unpaused when the failover finishes (successfully or not). However, if a client issues the first write command since the beginning of the failover during the time window described above (or a read command in the case of the stale-replica condition), the order of the conditions in So, we might want to check which of those error replies we don't want to send during a failover. Or we could re-order conditions (or do a combination of the two). However, re-ordering looks too scary to me to even give it a try... (I am not familiar with clustered setups, so I can't say anything about these. I looked into failover handling in more detail because of the sentinel improvement issue redis/redis#13118) |
I have checked
It can be seen that under normal circumstances, the primary does not unpause until the slot is taken over by its replica and it becomes the new primary. Therefore, in standalone mode, the unpause operation should be performed after receiving the reply to the |
if (server.failover_state == FAILOVER_IN_PROGRESS) {
if (psync_result == PSYNC_CONTINUE || psync_result == PSYNC_FULLRESYNC) {
clearFailoverState();
} else {
abortFailover("Failover target rejected psync request");
return;
}
} We are already doing it this way now, @gmbnomis can you provide more details about your issue? |
@soloestoy Yes, the locations of pausing and unpausing clients are correct. However, "pausing clients" means that there is a flag indicating that clients issuing new commands (after the pause started) should be blocked. Actually blocking clients while paused happens here in Line 4061 in 59aa008
And that check happens later in Line 3903 in 59aa008
As you said, clients are paused during the
|
I found a way to force gmbnomis@9a26fae is a test that:
|
Hi @gmbnomis , I think I understand your point. The main issue is during the I have seen your PR, but it seems too complicated and does not completely solve the problem (please correct me if I am wrong). I believe a simple and effective approach would be to, when the primary becomes a replica, i.e., during the |
Hi @soloestoy,
my PR solves a different problem (handling of clients that are already blocked (not postponed) before the failover begins). So yes, it does not solve the problem described here. |
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Signed-off-by: mwish <maplewish117@gmail.com>
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Signed-off-by: mwish <maplewish117@gmail.com>
Fix #821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Fix #821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Signed-off-by: Ping Xie <pingxie@google.com>
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Signed-off-by: Ping Xie <pingxie@google.com>
Describe the bug
A client using the new
CLIENT CAPA redirect
feature may see bouncingREDIRECT
responsesduring a failover initiated by the
FAILOVER
command. To be more precise, during theFAILOVER_IN_PROGRESS
failover state.While the current primary is in
FAILOVER_IN_PROGRESS
state,server.primary_host
is alreadyset to the designated primary. However, the designated primary still acts as a replica until the
PSYNC FAILOVER
command is processed.The primary will already reply with a
REDIRECT
(to the designated primary) in this situation. But the latter is still a replica and willREDIRECT
the client back to the old primary.To reproduce
This is hard to reproduce since, under normal circumstances, the duration of the
FAILOVER_IN_PROGRESS
state is very short: During this phase, a connection from the primaryto the designated primary is opened to send, among other commands, the
PSYNC FAILOVER
command.Nevertheless, in case of e.g. a network glitch, this may take much longer (actually, there
seems to be no timeout for this phase(?))
I produced this behavior by blocking the client when receiving the
PSYNC FAILOVER
insyncCommand
.Update: The behavior can also be reproduced by letting the replica sleep, see gmbnomis@ca708a4 for a demonstration.
Expected behavior
During the first phase of a failover (
FAILOVER_WAIT_FOR_SYNC
), a writing client will be pausedwhile reads can still proceed. During the second phase (
FAILOVER_IN_PROGRESS
) the same behavioris expected.
Only after finishing the failover procedure, i.e. when the clients are unpaused again,
REDIRECT
repliesshould be sent (since the new primary is ready to accept writes now.)
Additional information
Note that it isn't sufficient to just ensure that we aren't in a "failover" when sending redirect at
valkey/src/server.c
Line 3903 in 59aa008
Instead of
REDIRECT
, the client will seeREADONLY
errors duringFAILOVER_IN_PROGRESS
(caused byvalkey/src/server.c
Line 3987 in 59aa008
We could add a corresponding
server.failover_state == NO_FAILOVER
clause to the respectiveif
. However, it looks likeMASTERDOWN
could be triggered duringFAILOVER_IN_PROGRESS
as well ifreplica-serve-stale-data
isno
(I did not test this).It may also be worth considering whether we should change the order of the if clauses (i.e. pause clients "earlier").
The text was updated successfully, but these errors were encountered: