-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IpVersions/EchoIntegrationTest.AddRemoveListener/IPv6 is flaky #3997
Comments
I've seen this also on my own machine. |
I poke around this a bit yesterday evening and it seems to be a race in the AddRemoveListenerTest between the RawConnectionDriver making a connect attempt and the actual listener socket being closed. It seems like sometimes the listener socket is closed concurrently with the RawConnectionDriver's connect attempt. The RawConnectionDriver's ConnectionImpl sees a successful connection (write event triggers onWriteReady and getsockopt returns no error) and then writes the initial data. No further events occur and the RawConnectionDriver waits in Dispatcher::run until the test times out. When the test passes, the connect either happens before or after the socket close which either leads to an immediate connect failure or a deferred one, and in both those cases the test terminates successfully. |
This was failing enough today I'd back disabling first and debugging later, if anyone is willing to own debug |
At least one failure mode is that when the listener was released, some other test would yoink the released port, and the "make sure we can not connect to a removed listener" check would unexpectedly result in a connection. Running the test as exclusive should fix that particular failure mode, and allow us to see if others exist. I believe the reason the test was flaking more often when run in parallel with -l trace is because the test ran more slowly, the lag between the listener releasing the port and the raw connection driver increased, so the likelihood that another test would snag the port also increased. Risk Level: Low (test only) Testing: 1000 runs with "exclusive" Docs Changes: n/a Release Notes: n/a Fixes #3997 Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Description:
on master 028387a run:
bazel test --runs_per_test=100 //test/integration:echo_integration_test
will result 2 runs out of 100
TIMEOUT
, with"-l trace"
got 8 out of 100.The text was updated successfully, but these errors were encountered: