Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd cluster become unhealthy when node fail happened #236

Closed
xiaojingchen opened this issue Dec 17, 2018 · 4 comments
Closed

pd cluster become unhealthy when node fail happened #236

xiaojingchen opened this issue Dec 17, 2018 · 4 comments
Labels
test/stability stability tests type/bug Something isn't working

Comments

@xiaojingchen
Copy link
Contributor

PD cluster has 3 members, the PD cluster becomes unhealthy when the member[example-pd-2] down.

kubectl get po -n example
NAME                               READY     STATUS             RESTARTS   AGE
example-monitor-7bc8cdb97b-f7472   2/2       Running            0          4h
example-monitor-7bc8cdb97b-jjv6g   2/2       Terminating        0          4h
example-pd-0                       1/1       Running            0          4h
example-pd-1                       1/1       Running            1          4h
example-pd-2                       1/1       Terminating        1          4h
example-tidb-0                     0/1       CrashLoopBackOff   52         4h
example-tidb-1                     1/1       Running            0          4h
example-tikv-0                     2/2       Running            0          4h
example-tikv-1                     2/2       Running            0          4h
example-tikv-2                     2/2       Terminating        0          4h

example-pd-0 's log

2018/12/17 10:32:18.473 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.473 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.677 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.677 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.678 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.678 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.879 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.879 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.879 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:18.879 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.080 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.080 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.080 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.080 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.244 raft.go:857: [info] 1fbe35312aff5d1d is starting a new election at term 3
2018/12/17 10:32:19.244 raft.go:684: [info] 1fbe35312aff5d1d became pre-candidate at term 3
2018/12/17 10:32:19.244 raft.go:755: [info] 1fbe35312aff5d1d received MsgPreVoteResp from 1fbe35312aff5d1d at term 3
2018/12/17 10:32:19.244 raft.go:742: [info] 1fbe35312aff5d1d [logterm: 3, index: 593] sent MsgPreVote request to 4da3dda092b483e2 at term 3
2018/12/17 10:32:19.244 raft.go:742: [info] 1fbe35312aff5d1d [logterm: 3, index: 593] sent MsgPreVote request to b4087811ad981723 at term 3
2018/12/17 10:32:19.255 raft.go:782: [info] 1fbe35312aff5d1d [logterm: 3, index: 593, vote: b4087811ad981723] ignored MsgPreVote from 4da3dda092b483e2 [logterm: 3, index: 593] at term 3: lease is not expired (remaining ticks: 6)
2018/12/17 10:32:19.282 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.282 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.282 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.282 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.483 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.483 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.483 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.483 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.684 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal
2018/12/17 10:32:19.684 raft.go:1135: [info] 1fbe35312aff5d1d no leader at term 3; dropping proposal

example-pd-1 's log

2018/12/17 12:07:19.339 raft.go:857: [info] 4da3dda092b483e2 is starting a new election at term 3
2018/12/17 12:07:19.339 raft.go:684: [info] 4da3dda092b483e2 became pre-candidate at term 3
2018/12/17 12:07:19.339 raft.go:755: [info] 4da3dda092b483e2 received MsgPreVoteResp from 4da3dda092b483e2 at term 3
2018/12/17 12:07:19.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to b4087811ad981723 at term 3
2018/12/17 12:07:19.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to 1fbe35312aff5d1d at term 3
2018/12/17 12:07:19.758 log.go:84: [warning] rafthttp: [health check for peer b4087811ad981723 could not connect: dial tcp: lookup example-pd-2.example-pd-peer.example.svc on 10.192.0.3:53: no such host]
2018/12/17 12:07:21.023 log.go:84: [warning] etcdserver: [read-only range request "key:\"/tidb/store/gcworker/saved_safe_point\" " took too long (5.000060385s) to execute]
2018/12/17 12:07:22.321 raft.go:782: [info] 4da3dda092b483e2 [logterm: 3, index: 593, vote: b4087811ad981723] ignored MsgPreVote from 1fbe35312aff5d1d [logterm: 3, index: 593] at term 3: lease is not expired (remaining ticks: 1)
2018/12/17 12:07:22.339 raft.go:857: [info] 4da3dda092b483e2 is starting a new election at term 3
2018/12/17 12:07:22.339 raft.go:684: [info] 4da3dda092b483e2 became pre-candidate at term 3
2018/12/17 12:07:22.339 raft.go:755: [info] 4da3dda092b483e2 received MsgPreVoteResp from 4da3dda092b483e2 at term 3
2018/12/17 12:07:22.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to 1fbe35312aff5d1d at term 3
2018/12/17 12:07:22.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to b4087811ad981723 at term 3
2018/12/17 12:07:24.011 log.go:84: [warning] etcdserver: [timed out waiting for read index response]
2018/12/17 12:07:24.758 log.go:84: [warning] rafthttp: [health check for peer b4087811ad981723 could not connect: dial tcp: lookup example-pd-2.example-pd-peer.example.svc on 10.192.0.3:53: no such host]
2018/12/17 12:07:25.321 raft.go:782: [info] 4da3dda092b483e2 [logterm: 3, index: 593, vote: b4087811ad981723] ignored MsgPreVote from 1fbe35312aff5d1d [logterm: 3, index: 593] at term 3: lease is not expired (remaining ticks: 1)
2018/12/17 12:07:25.339 raft.go:857: [info] 4da3dda092b483e2 is starting a new election at term 3
2018/12/17 12:07:25.339 raft.go:684: [info] 4da3dda092b483e2 became pre-candidate at term 3
2018/12/17 12:07:25.339 raft.go:755: [info] 4da3dda092b483e2 received MsgPreVoteResp from 4da3dda092b483e2 at term 3
2018/12/17 12:07:25.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to 1fbe35312aff5d1d at term 3
2018/12/17 12:07:25.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to b4087811ad981723 at term 3
2018/12/17 12:07:27.025 log.go:84: [warning] etcdserver: [read-only range request "key:\"/tidb/store/gcworker/saved_safe_point\" " took too long (5.000522266s) to execute]
2018/12/17 12:07:28.321 raft.go:782: [info] 4da3dda092b483e2 [logterm: 3, index: 593, vote: b4087811ad981723] ignored MsgPreVote from 1fbe35312aff5d1d [logterm: 3, index: 593] at term 3: lease is not expired (remaining ticks: 1)
2018/12/17 12:07:28.339 raft.go:857: [info] 4da3dda092b483e2 is starting a new election at term 3
2018/12/17 12:07:28.339 raft.go:684: [info] 4da3dda092b483e2 became pre-candidate at term 3
2018/12/17 12:07:28.339 raft.go:755: [info] 4da3dda092b483e2 received MsgPreVoteResp from 4da3dda092b483e2 at term 3
2018/12/17 12:07:28.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to 1fbe35312aff5d1d at term 3
2018/12/17 12:07:28.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to b4087811ad981723 at term 3
2018/12/17 12:07:29.758 log.go:84: [warning] rafthttp: [health check for peer b4087811ad981723 could not connect: dial tcp: lookup example-pd-2.example-pd-peer.example.svc on 10.192.0.3:53: no such host]
2018/12/17 12:07:31.321 raft.go:782: [info] 4da3dda092b483e2 [logterm: 3, index: 593, vote: b4087811ad981723] ignored MsgPreVote from 1fbe35312aff5d1d [logterm: 3, index: 593] at term 3: lease is not expired (remaining ticks: 1)
2018/12/17 12:07:31.339 raft.go:857: [info] 4da3dda092b483e2 is starting a new election at term 3
2018/12/17 12:07:31.339 raft.go:684: [info] 4da3dda092b483e2 became pre-candidate at term 3
2018/12/17 12:07:31.339 raft.go:755: [info] 4da3dda092b483e2 received MsgPreVoteResp from 4da3dda092b483e2 at term 3
2018/12/17 12:07:31.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to 1fbe35312aff5d1d at term 3
2018/12/17 12:07:31.339 raft.go:742: [info] 4da3dda092b483e2 [logterm: 3, index: 593] sent MsgPreVote request to b4087811ad981723 at term 3
@xiaojingchen
Copy link
Contributor Author

@nolouch PTAL

@weekface weekface added the test/stability stability tests label Dec 18, 2018
@nolouch
Copy link
Member

nolouch commented Dec 19, 2018

Seems to be a bug about prevote , related etcd-io/etcd#8334.

@tennix tennix added the type/bug Something isn't working label Jan 29, 2019
@weekface
Copy link
Contributor

@nolouch is this issue fixed by pd?

@aylei
Copy link
Contributor

aylei commented Nov 15, 2019

PD has adopted the upstream fix confirmed by @nolouch
closing

@aylei aylei closed this as completed Nov 15, 2019
yahonda pushed a commit that referenced this issue Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test/stability stability tests type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants