Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(region_cache): sync leader store epoch when switchWorkLeaderToPeer #573

Merged
merged 2 commits into from
Sep 7, 2022

Conversation

Ryan-Git
Copy link
Contributor

If leader transfers to a store that disconnected for a while and comes back now, request should not be blocked by store epoch.

Signed-off-by: renhongdi <ryan.hd.ren@gmail.com>
@sticnarf
Copy link
Collaborator

I'm not confident enough about this change.

The epoch in Store can be updated concurrently. When calling switchWorkLeaderToPeer, the epoch may have been updated again due to failed requests. Then, it may be not reasonable to always synchronize storeEpochs[leaderIdx] to the latest epoch in Store.

@Ryan-Git
Copy link
Contributor Author

Ryan-Git commented Aug 26, 2022

Yes it may cause a few more concurrent requests send to disconnected KV.

But since raft think the target store is still leader(likely just transferred leadership to it) and the response shouldn't be delayed for too long, it should be rare.

The good part is it avoids randomized retries afterward. In normal deployments, the restarted store will be leader balanced quickly, which causes quite a few fake epoch not match errors.

I'm testing this anyway... will post some numbers for reference later.

@disksing
Copy link
Collaborator

maybe we can read + CAS to update epoch and make sure it always increases.

@Ryan-Git
Copy link
Contributor Author

a few result.

workload: medium length transaction (0-500 statements per txn, uniform distribution)
chaos: force kill one kv and immediately restarts every hour on the hour

updated version starts from 16:00

kv:20002 is eliminated

image

total error(mostly statement timeout) not increased and no unexpected error after restart
image

Copy link
Collaborator

@sticnarf sticnarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@disksing disksing merged commit 8c1802b into tikv:master Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants