Implemented the Raft Consensus Algorithm with a leader lease mechanism to achieve enhanced fault tolerance and data consistency in distributed systems. It showcases seamless coordination between nodes and state recovery mechanisms to restore system functionality following failures.
The solution was successfully deployed and tested on Google Cloud, achieving 99% uptime.
-
Election Functionality
- Follower nodes maintain a randomized election timer (5-10 seconds) and listen for heartbeat or vote request events.
- If no event is received within the timeout, a follower becomes a candidate, increments its term, and sends
voteRPC
requests to other nodes. - Each node votes for only one candidate per term based on specific conditions.
- If a candidate receives a majority of votes in response to its
voteRPC
requests, it transitions to the Leader state, waiting for the old leader's lease timer to expire before starting its own lease. - The new leader appends a NO-OP entry to the log and sends heartbeats to all nodes.
-
Log Replication Functionality
- The leader sends periodic heartbeats (~1 second) to all nodes using the
appendEntriesRPC
, reacquiring its lease at each heartbeat by restarting the lease timer. - The lease duration is a fixed value between 2 and 10 seconds, decided before execution, and remains constant throughout.
- If the leader fails to reacquire the lease (i.e., does not receive successful acknowledgments from the majority of followers within the lease duration), it must step down.
- Followers monitor and track the duration of the leader's lease during each heartbeat.
- When the leader receives a client SET request, it uses
appendEntriesRPC
to replicate the log to all nodes. For GET requests, the value is returned immediately if the lease is acquired. - The leader identifies the latest matching log entry in the follower's log and removes any entries beyond that point before transmitting subsequent leader entries.
- Once a majority of nodes have replicated the log, the leader sends a SUCCESS reply to the client; otherwise, it sends a FAIL message.
- The leader sends periodic heartbeats (~1 second) to all nodes using the
-
Committing Entries
- The leader commits an entry only after a majority of nodes acknowledge appending it and ensures that the latest entry to be committed belongs to the same term as the leader.
- Follower nodes commit entries using the
LeaderCommit
field in theappendEntriesRPC
received during each heartbeat.
-
Client Interaction
- Followers, candidates, and the leader form a Raft cluster serving Raft clients.
- The client stores the IP addresses and ports of all nodes and maintains the current leader ID, which might become outdated.
- The client sends a GET/SET request to the leader node and updates its leader ID in case of a failure, resending the request to the updated leader.
- If there is no leader, the node returns NULL for the current leader and a failure message.
- The client continues sending requests until it receives a SUCCESS reply from any node.
- Supported operations for the client are:
- SET K V: Maps the key K to value V (WRITE OPERATION).
- GET K: Returns the latest committed value of key K; if K doesn’t exist, an empty string is returned (READ OPERATION).
-
Print Statements & Log Generation
- The implementation includes print statements to indicate the current state of each node and the operations being performed (e.g., starting elections, sending heartbeats, committing entries).
- Running the
nodeX.py
files generates alogs_node_X
folder for each node (where X is the node ID). This folder contains:- logs.txt: Stores the data of all the logs and the current term.
- metadata.txt: Contains information such as
commitLength
,appliedLength
,current_term
, andvotedFor
. - dump.txt: Captures all the required print statements and logs the significant events and state transitions that occur during execution, providing valuable insights into the operation of the Raft algorithm.
- Clone the repo
git clone https://github.com/aryanGupta-09/Raft-Algorithm.git
- Go to the project directory
cd Raft-Algorithm
- Generate the Python code for gRPC
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. node.proto
-
Run the
node.py
files in different instances after updating the node IDs and the IP addresses. -
Run the
client.py
file in a separate instance to perform SET or GET operations.