-
Notifications
You must be signed in to change notification settings - Fork 1.1k
chore: new ReplicaOf algorithm #5774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: kostas <kostas@dragonflydb.io>
} | ||
|
||
void ServerFamily::ReplicaOf(CmdArgList args, const CommandContext& cmd_cntx) { | ||
ReplicaOfInternal(args, cmd_cntx.tx, cmd_cntx.rb, ActionOnConnectionFail::kReturnOnError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that changes are grouped together nicely. Let's introduce the flag experimental_replicaof_v2
set to true so we could have the fallback to the old variant
src/server/server_family.cc
Outdated
|
||
auto new_replica = make_shared<Replica>(replicaof_args->host, replicaof_args->port, &service_, | ||
master_replid(), replicaof_args->slot_range); | ||
GenericError ec{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
GenericError ec{}; | |
GenericError ec; |
|
||
if (!ss->is_master) { | ||
CHECK(replica_); | ||
// flip flag before clearing replica_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: please keep an empty line before comment lines. here and everywhere.
What's the functional change now? how it is changed against the previous version? |
This PR implements the new ReplicaOf algorithm that does not use two phase locking.
How ReplicaOf worked previously:
There are multiple issue with this:
How it works now:
The new algorithm is very simple and has only two steps:
Create a
Replica
object, Initiate connection to the new master, greet and exchange the necessary info to setup replication. (internally it’s just REPLCONF setting connection members and info). So far there is no update to the state. We merely check if we can establish a connection with the new master and that everything is ok.If there are no errors, lock the mutex, update the replica_ object and start the MainReplicationFiber. Do not enter LOADING state prematurely. First check if partial sync is available in main replication fiber and if not,
then enter LOADING state
and do full sync.Now all
ReplicaOfInternal
commands are serialized. Cancellation is not affected because the connection that enters the critical section of (b) will cancel/stop the previous one. This happens one after the other in a deterministic manner.