-
Notifications
You must be signed in to change notification settings - Fork 58
Cold Bootstraping
In designing our warm up feature, we considered the following principles:
- Do not introduce inconsistencies;
- Do not achieve a fully warmed up node, but get as many data as possible.
- Do not create any major issues to another node (i.e. node used as the source of truth)
Based on these principles, a failure of the warm up process should not cause an issue in the cluster, and it should not stop the newly brought up node.
The cold bootstrapping feature is currently designed around Redis. It leverages the master-slave diskless replication of Redis. Dynomite-manager sets the target node that needs to be warmed up as “slave”, and finds another peer with the same token in the local region. That peer is designated as the master, and therefore forces Redis to transfer an rdb file and load it to Redis. Once the warm up is complete then Redis is switched back to serve traffic as a normal master.
Warm up or bootstrapping is triggered by a termination of a node. This is in turn causes a new token to be generated, which initiates the warm up process. The following is the sequence of operations:
- Searches for the correct token: Warming up node's own token(s) : 1383429731
- Determines which peer nodes have the same token. A random peer node within the local region is selected to be used for the warm up which avoids cross-region communication issues.
- Redis issues SLAVEOF command to that peer. Effectively the target Redis instance sets itself as a slave of that node. In addition, Dynomite-manager sets Dynomite to be in standby mode so that traffic is not received and the node remains out of discovery.
- A Dynomite node is fully warmed up if it has received all the data from the remote at the time the warm up process started. To determine if a node is warmed up we use the difference between the Redis master and the Redis slave offset. Both offsets are calculated from the Redis master node that was selected as the source of warm up. This gives us the correct view of how much data the remote Redis master node has streamed and how much data it believes the Redis slave node has received.
- Once master and slave are in sync, Dynomite is set to allow writes only.
- Redis is stopped from peer syncing by using Redis “SLAVEOF NO ONE” command
- Dynomite is set back to normal state. Process checks the health of Dynomite, if there is an issue Dynomite gets restarted.
- Done!
Configuring Redis to perform master/slave replication is not straightforward. For that reason, we provide some of the parameters we use in our Redis Configuration.
You might also need to open the Redis port to the Security Group ingress on AWS on the Redis port (by default we use 22122).
There are 3 properties that control the warm up:
- Enabling and disabling warm up:
dynomitemanager.dyno.warm.bootstrap true
(boolean) - The Bytes difference after which we consider the master/slave in sync:
dynomitemanager.dyno.warm.bytes.sync.diff 100000
(integer) - The amount of time that warm up takes:
dynomitemanager.dyno.warm.msec.bootstraptime 900000
(integer)
The first property effectively allows disabling warm up for caching use cases that data loss might be not an issue. The second property allows configuring the bytes difference. This value can be configured based on the Writes Per Second (WPS) throughput of the master node. Last, the amount of time that warm up takes should be configured based one the amount of RAM Redis takes. For example, in production we use for r3.2xlarge (64GB RAM) 900000 msec (15 min), r3.4xlarge (128GB RAM) 1800000 msec (30 min) and so forth.