-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The hot cache cannot be cleared, when the interval is less than 60 #4390
Comments
Signed-off-by: lhy1024 <admin@liudos.us>
* move file Signed-off-by: lhy1024 <admin@liudos.us> * ref #4390 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
* move file Signed-off-by: lhy1024 <admin@liudos.us> * ref tikv#4390 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Our current heartbeat process looks like this. If the interval is less than the default interval for heartbeats (60 seconds), then we will put it into the cache temporarily and wait until it collects 60 seconds before considering whether it is hot enough. For a peer that has just been reported, if the region is also in the hot cache, then there are three cases.
The problem occurs in the third branch, if the old peer is used directly without clone, then the old item and the new item will be written at the same time. And when the new peer interval is less than 60 seconds, it means that it will be temporarily put into the cache. If the old peer will be cooled down at this time, it will keep the peer in the hot cache for a long time and cannot be exited. |
… the interval is less than 60 (#4396) * fix cache in 5.2/5.3 ref #4390 Signed-off-by: lhy1024 <admin@liudos.us> * fix test Signed-off-by: lhy1024 <admin@liudos.us> * fix ci Signed-off-by: lhy1024 <admin@liudos.us> * address comment Signed-off-by: lhy1024 <admin@liudos.us> * address comment Signed-off-by: lhy1024 <admin@liudos.us> * fix ci Signed-off-by: lhy1024 <admin@liudos.us> * fix test Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
For Suppose there is a cluster with region heartbeat 5s. At t1, when a peer in store1 is changed to cold, and other peers still are hot, it will adopt from store2 as own, it is expected.
But other peers will be changed to cold, the problem occurs. A peer in store2 or store3 adopts from store1 at t2, but this child is from store2 at t1.
So we should avoid to adopt from adopted item before it is confirmed whether it is hot. |
/assign @lhy1024 |
/severity major |
* fix hot peer cache Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * fix ci Signed-off-by: lhy1024 <admin@liudos.us> * address comment Signed-off-by: lhy1024 <admin@liudos.us> * ref #4390 Signed-off-by: lhy1024 <admin@liudos.us> * add comment and test Signed-off-by: lhy1024 <admin@liudos.us> * address comments Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * add comment Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ShuNing <nolouch@gmail.com>
* fix hot peer cache Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * fix ci Signed-off-by: lhy1024 <admin@liudos.us> * address comment Signed-off-by: lhy1024 <admin@liudos.us> * ref tikv#4390 Signed-off-by: lhy1024 <admin@liudos.us> * add comment and test Signed-off-by: lhy1024 <admin@liudos.us> * address comments Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * add comment Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ShuNing <nolouch@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us>
… the interval is less than 60 (#4396) (#4432) * This is an automated cherry-pick of #4396 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> * fix conflict without sync test Signed-off-by: lhy1024 <admin@liudos.us> * fix lint Signed-off-by: lhy1024 <admin@liudos.us> * ref #4390 Signed-off-by: lhy1024 <admin@liudos.us> * statistics: fix hot peer cache (#4446) * fix hot peer cache Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * fix ci Signed-off-by: lhy1024 <admin@liudos.us> * address comment Signed-off-by: lhy1024 <admin@liudos.us> * ref #4390 Signed-off-by: lhy1024 <admin@liudos.us> * add comment and test Signed-off-by: lhy1024 <admin@liudos.us> * address comments Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * add comment Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ShuNing <nolouch@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * fix test Signed-off-by: lhy1024 <admin@liudos.us> * revert log Signed-off-by: lhy1024 <admin@liudos.us> * remove todo Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: lhy1024 <admin@liudos.us> Co-authored-by: ShuNing <nolouch@gmail.com>
* fix hot peer cache Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * fix ci Signed-off-by: lhy1024 <admin@liudos.us> * address comment Signed-off-by: lhy1024 <admin@liudos.us> * ref tikv#4390 Signed-off-by: lhy1024 <admin@liudos.us> * add comment and test Signed-off-by: lhy1024 <admin@liudos.us> * address comments Signed-off-by: lhy1024 <admin@liudos.us> * fix Signed-off-by: lhy1024 <admin@liudos.us> * add more test Signed-off-by: lhy1024 <admin@liudos.us> * add comment Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ShuNing <nolouch@gmail.com> Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
… the interval is less than 60 (#4396) (#4433) * This is an automated cherry-pick of #4396 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> * fix test Signed-off-by: lhy1024 <admin@liudos.us> * ref #4390 Signed-off-by: lhy1024 <admin@liudos.us> * fix ci Signed-off-by: lhy1024 <admin@liudos.us> * pick for #4446, #4512 Signed-off-by: lhy1024 <admin@liudos.us> * fix test Signed-off-by: lhy1024 <admin@liudos.us> * update Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: lhy1024 <admin@liudos.us>
Bug Report
What did you do?
stop bench
What did you expect to see?
hot cache is cleared
What did you see instead?
there are still many hot peers
What version of PD are you using (
pd-server -V
)?v5.0.4
The text was updated successfully, but these errors were encountered: