-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
online recovery: fix online recovery timeout mechanism #6108
Conversation
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
PTAL @v01dstar |
Can you please explain a little bit more about what the bug is? I can tell that with this change, the whole process will exit faster when timeout happens (the old / existing also exit after timeout, i believe? ). Besides, I think, with this change, we may leave some regions in exit force leader state when timeout? |
Suppose that, one TiKV always returns store heartbeat but without store report for somewhat reason. Then in the existing impl, it would never trigger timeout and keep in the collecting stage forever. |
I think the existing code still exit? Just with a longer wait time, |
Please check |
// blocks reads and writes. | ||
u.storePlanExpires = make(map[uint64]time.Time) | ||
u.storeRecoveryPlans = make(map[uint64]*pdpb.RecoveryPlan) | ||
u.timeout = time.Now().Add(storeRequestInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't one heartbeat interval too aggressive? Maybe *2 to make it more stable?
@v01dstar: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #6108 +/- ##
==========================================
+ Coverage 74.03% 74.12% +0.08%
==========================================
Files 385 385
Lines 37952 37952
==========================================
+ Hits 28099 28131 +32
+ Misses 7377 7353 -24
+ Partials 2476 2468 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
/merge |
@nolouch: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 22eec8a
|
@Connor1996: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
In response to a cherrypick label: new pull request created to branch |
close tikv#6107 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
close tikv#6107 fix online recovery timeout mechanism Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>
What problem does this PR solve?
Issue Number: Close #6107
What is changed and how does it work?
Check List
Tests
Release note