Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: the peer id mismatch is not handled #45297

Closed
cfzjywxk opened this issue Jul 11, 2023 · 2 comments
Closed

store: the peer id mismatch is not handled #45297

cfzjywxk opened this issue Jul 11, 2023 · 2 comments
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. severity/major sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.

Comments

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Jul 11, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Consider such a case:

  1. region_id = 0
    There's a region with initial peers "1, 2, 3", and later region configuration has changed, for example:
1, 2, 3                    conf change => 1 2 3(Removed) 4(Added)
1, 2, 3(Removed), 4(Added) conf change => 1 2 5(Added) 4(Removed)
  1. The region cache is not refreshed in time, so peer information whose store_id is 3, region_id is 0 andpeer_id is 3 still exists.
  2. AccessFollower is used by the replica selector to decide the target peer for example when stale read is used.
  3. If the leader is transferred to peer 5 and stale read retry happens.

2. What did you expect to see? (Required)

The region cache is refreshed as expected.

3. What did you see instead (Required)

After RegionError is received by the kv-client, the bottom retry keeps happening and there's no chance to rerefresh to invalidate the region cache, thus there are quite a lot of retry and keep encountering peer is mismatch region errors.

The cause is possibly that:

  1. The peer id mismatch is not processed, the replica selector would always retry.
  2. The scope fixing PR make the leader only selection take effect.
  3. Once the leaderOnly flag is set, all the checks are bypassed so the const maxReplicaAttempt = 10 dose not work either.

To fix this issue, it's needed to:

  1. Process the peer id mismatch error properly and ensure the region cache is invalidated.
  2. When leaderOnly flag is set, ensure the necessary limit takes effect for example the const maxReplicaAttempt = 10. Acually this part code is complex and need careful handling.

4. What is your TiDB version? (Required)

v6.5.3

@cfzjywxk cfzjywxk added type/bug The issue is confirmed as a bug. sig/transaction SIG:Transaction severity/major labels Jul 11, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Jul 11, 2023
@cfzjywxk cfzjywxk added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. and removed may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Jul 11, 2023
@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Oct 7, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. severity/major sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

2 participants