Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] RaftConsensusITest.TestAtomicAddRemoveServer flaky in master #5912

Closed
bmatican opened this issue Oct 1, 2020 · 0 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features kind/failing-test Tests and testing infra

Comments

@bmatican
Copy link
Contributor

bmatican commented Oct 1, 2020

https://detective-gcp.dev.yugabyte.com/stability/test?class=RaftConsensusITest&name=TestAtomicAddRemoveServer

Splitting up from #4548

@bmatican bmatican added kind/failing-test Tests and testing infra area/docdb YugabyteDB core features labels Oct 1, 2020
@bmatican bmatican self-assigned this Oct 1, 2020
bmatican added a commit that referenced this issue Oct 2, 2020
Summary:
When sending a new request for a peer in raft, we added some custom code to force changing
the role of a peer, in case it is a PRE_VOTER / PRE_OBSERVER. However, this new code path is
essentially a deviation from the normal handling of request data, which leads to the request_
object still having ops, for the next thread coming in, either to prepare a new request, or process
a peer response. We had one `DCHECK(request->ops().empty())` which was failing across tests,
signaling this is likely a broader issue.

I noticed we actually have a couple of different return paths from this function, so I added an
exit handler, but special cased the peer promotion path, as it was dropping locks early.

Test Plan:
`ybd --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.TestAtomicAddRemoveServer -n 500`
Failing 0/500!

`ybd --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.TestLaggingFollowerRestart -n 500 --tp 3`
Failing 12/500
- all with `Check failed: header_.IsInitialized()`

`ybd --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.TestConfigChangeUnderLoad -n 500 --tp 3`
Failing 19/500
- 4 with a test timeout
- 5 failing in the Messenger destructor with `closing_ Should have already shut down`
- rest with `Resource temporarily unavailable` -- probably some local test parallelism issue

Reviewers: rsami, hector, sergei, zyu

Reviewed By: sergei, zyu

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D9529
@bmatican bmatican closed this as completed Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/failing-test Tests and testing infra
Projects
None yet
Development

No branches or pull requests

1 participant