Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLAKE: TestKVDelete #14275

Closed
serathius opened this issue Jul 26, 2022 · 7 comments
Closed

FLAKE: TestKVDelete #14275

serathius opened this issue Jul 26, 2022 · 7 comments

Comments

@serathius
Copy link
Member

https://github.com/etcd-io/etcd/runs/7499725943?check_suite_focus=true

Interesting log:

/home/runner/work/etcd/etcd/bin/etcd-25280: {"level":"error","ts":"2022-07-25T12:53:51.450Z","caller":"embed/etcd.go:778","msg":"setting up serving from embedded etcd failed.","error":"accept tcp 127.0.0.1:20000: use of closed network connection","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\t/home/runner/work/etcd/etcd/server/embed/etcd.go:778\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).serveClients.func1\n\t/home/runner/work/etcd/etcd/server/embed/etcd.go:735"}
@tjungblu
Copy link
Contributor

Just took a brief look, even on my more powerful box that test takes nearly 12s and comes very close to the defined timeout after adding a print for the time left:

time left on deadline 701.153701ms

Shall we increase the timeout for those tests? Alternatively there could also be two timeouts, one for spawning the cluster (5s?) and another for the test execution.

setting up serving from embedded etcd failed.

This message happens 695 times in the log so it's probably not related to that test flaking, but still interesting that it happens so frequently.

@serathius
Copy link
Member Author

Hmm, that's weird, I would look into why this test took so long. Possibly look into how much it took when the test was introduced.

@tjungblu
Copy link
Contributor

Good point, the test from #13753 takes only 3s instead of 12s. I'm going to bisect the changes since then and see what made it so much longer.

@tjungblu
Copy link
Contributor

tjungblu commented Jul 27, 2022

The last one that takes 3s was in e4d34f21bc92b6e4c3255a9c893a2afea07a05e0

The runtime then doubled starting with #13864 and commit 63346bfead7dd9ad071485d7cc7d6c8bd9fc36b7 or more probably: e5bf230

I'll check deeper tomorrow which of the changes in there causes this.

@tjungblu
Copy link
Contributor

I think I got it, it's actually a bit surprising and I didn't expect that:
e5bf230#diff-014dc5cc5559dd7777391eeb579c4f6b374e15e1c6ecb1ea199606a417d5eafcR117

The 100ms sleep is causing the delay, if you change it down to 10ms runtime goes to where it was before.

@serathius
Copy link
Member Author

serathius commented Jul 28, 2022

Awesome find! Surprising that I didn't notice it when running it locally.
Feel free to send a PR.

@serathius
Copy link
Member Author

Let's close it and reopen if issue resurfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants