-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: the end-2-end tests have become extremely flaky #4606
Comments
Papered over in #4605 we still need to investigate |
The patch helps only somewhat. I added a 10s delay after wait-connectivity. That's not even quite enough, so I added a triple retry on pings. That is enough for the test to succeed most of the time, but the retry only applies to the end2end_integration test. There's another test that keeps failing: scion_integration. That one doesn't have a retry option. In the end we just need to figure out why it takes so long for path segments to become available. |
So, it appears that the segments are available after all (give-or-take a small fix in await-connectivity). What makes the tests fail is Deadline exceeded errors when trying to fetch the segments. Following the breadcrubs, I ended-up seeing a CS RPCing to another and both (if memory serves) of them disappearing for several seconds in the middle of processing the request. So, of course, the 10 s client timeout blows up eventually and so the whole chain of RPCs fails. Increasing the timeout doesn't fix it, so it seems that the hangups can last indefinitely; until the timeout blows up. |
Reading the release notes carefully, this is the only thing that stands out: https://tip.golang.org/doc/go1.23#timer-changes And indeed, there is something in the Go issue tracker: golang/go#69312 and the offending library quic-go/quic-go#4659 In the meantime, we can downgrade, or use Downgrading Go version shows very high reliability: https://buildkite.com/scionproto/scion/builds/4751 |
Until golang/go#69312 is resolved, force the old timer behavior by specifying an older go version in the go.mod file. Fixes #4606
Found this in the wake of #4606 I believe that await-connectivity could mistake core segments for up segments (i.e. assuming that only up segments could be found). It still makes the optimistic assumption that down segments are registered immediately after up segments are obtained. We have to be content with that because in hidden paths test cases the down segments cannot all be found via a simple REST API query.
They have less than 20% success rate.
The text was updated successfully, but these errors were encountered: