-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Neil Shen <overvenus@gmail.com>
- Loading branch information
Showing
1 changed file
with
115 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# Summary | ||
|
||
At the time read requests are handled by a single thread named raftstore, which | ||
also handles write requests and other tasks. This RFC proposes introducing a new | ||
thread named local reader, for separating read requests from the raftstore. | ||
|
||
## Motivation | ||
|
||
There are three major workloads for the raftstore: | ||
|
||
- Handle read requests: CPU intensive. | ||
- Handle write requests: I/O intensive. | ||
- Drive Raft state machines: CPU intensive. | ||
|
||
For read requests, TiKV takes a snapshot of the underlying RocksDB when a Raft | ||
leader is in its lease. Read requests are lightweight and the raftstore can | ||
handle them fast. However, due to the single-threaded nature of the raftstore, | ||
read requests may be blocked by other workloads. E.g., | ||
|
||
- Read QPS drops while number of write requests increases, and the raftstore | ||
spends more time in writing Raft logs. | ||
- Read QPS drops while the Regions number grows, and the raftstore spends more | ||
time in driving Raft state machines. | ||
|
||
By having a dedicated thread for read requests, we can separate read requests | ||
from other workloads and address above issues. TiKV can get better performance | ||
and lower latency on read most workload. | ||
|
||
## Detailed design | ||
|
||
The local reader offloads read requests from the raftstore and guarantees | ||
linearizability. | ||
|
||
### Local reader | ||
|
||
The local reader uses `ReadDelegate`s (delegate) to handle requests. Every | ||
delegate is owned by a Raft peer which belongs to the raftstore. The delegate | ||
and the peer communicate via a channel, and each pair of them shares an atomic | ||
`LeaderLease`. | ||
|
||
A peer can do local read as long as it holds the following conditions (only | ||
listing the most important): | ||
|
||
- It’s a Raft leader; | ||
- Its applied index’s term matches its current term; | ||
- It has a valid leader lease. | ||
|
||
Before reading, a delegate checks the above conditions too. After reading is | ||
done, it performs an extra lease check, because read operations have to be done | ||
in a valid leader lease to guarantee linearizability. If a delegate fails these | ||
checks, it redirects read requests to the raftstore. | ||
|
||
### LeaderLease | ||
|
||
`LeaderLease` is a state shared by the local reader and the raftstore. It is | ||
critical to implement local reader correctly. When a leader peer steps down, | ||
its delegate is no longer allowed to read, otherwise it may read stall data | ||
which violates linearizability. The shared `LeaderLease` is implemented by an | ||
atomic variable, so a delegate observes leadership change immediately and stops | ||
handling read requests. | ||
|
||
### What requests can it handle | ||
|
||
The section specifies the requests that the local reader can handle. There are | ||
three types of message defined in the [raft_cmdpb.proto]. | ||
|
||
- Request | ||
- AdminRequest | ||
- StatusRequest | ||
|
||
The local reader can only handle a subset of `Request`, that is | ||
|
||
- GetRequest | ||
- SnapRequest | ||
|
||
As for other requests, it either redirects or panics on other requests. | ||
|
||
## Corner cases | ||
|
||
There are some corner cases and the typical ones are listed below. Most of the | ||
them can be resolved by expiring atomic `LeaderLease`. | ||
|
||
### Case 1 | ||
|
||
Local reader is blocked before taking the snapshot while the leadership has | ||
changed. | ||
|
||
It is addressed by expiring the leader lease. After reading, it will be checked | ||
whether the lease is outdated. If yes, the reading results will not be returned | ||
back. | ||
|
||
### Case 2 | ||
|
||
Local reader is blocked before taking the snapshot while the target Region | ||
splits into two. The original leader remains unchanged while the leader of the | ||
new Region is elected on other TiKV. | ||
|
||
It is addressed by expiring the original leader lease before split is done. | ||
|
||
## Drawbacks | ||
|
||
Keeping state synced correctly between peers and delegates is difficult. We must | ||
pay close attention to leader lease expiration if we want to make change to | ||
Raft. | ||
|
||
## Alternatives | ||
|
||
None that are immediately obvious. There must be some execution entity to handle | ||
requests if we want to separate read requests from the raftstore. | ||
|
||
## Unresolved questions | ||
|
||
None. | ||
|
||
[raft_cmdpb.proto]: https://github.com/pingcap/kvproto/blob/5e6e69a5ed381bd4a8afe7cb96cc47f955f6d160/proto/raft_cmdpb.proto |