-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync: Pool tests flaky on arm builders #31422
Comments
CC @aclements |
I just got this once in 1,045 runs of all.bash on my linux/amd64 workstation.
This is certainly a theoretically possible failure, but when I wrote this test I though the chance of hitting the bad schedule was infinitesimal. Maybe there's a more likely schedule that can cause this. |
Lots of instances of this on arm and arm64 builders:
|
when I run sync pool test cases like below for about 2000 times, they were all passed in arm64 device. when run all.bash script, the flag '-test.short' is set, which could make installation more efficient. In this case, the flag '-test.short' control value of "N". As comment in code "In theory it's possible in a valid schedule for popHead to never succeed", so I guess maybe N is too small to pass the case.
|
@aclements, is this still on the radar for 1.13? Is this more likely a bug in the test, or in the |
Given that the long test doesn't flake, this is almost certainly a bug in the test. In the short test, there are only 100 expected PopHeads. On my linux/amd64 laptop, in 1000 runs, it gets as low as 50 successful PopHeads, but that seems to be a hard floor. It does give me pause that the failure rate is that high, since I would expect these schedules to be quite rare. |
I added some logging. It looks like the time between the PushHead committing and the PopHead committing is just long enough that the racing PopTail loop can regularly succeed and drain the queue. This means it's just the test. I'm not sure why it's so flaky on arm64 specifically, but it may be that that window is just larger because of architectural details. I'm still thinking about how to make the test less flaky. We could of course just add retries, but it would be nice to do something better. |
Or we just remove the nPopHead check. |
Change https://golang.org/cl/183981 mentions this issue: |
Possibly related to #24640.
Samples:
https://build.golang.org/log/10c155a9635967f5b3006b6a04b6d5442ff9713a:
https://build.golang.org/log/3fbb17b4083eca9629c97f7b71879c804ecf5d0d and
https://build.golang.org/log/9f98720ace008f8c98f74c0d14049cb67b3c56f5:
The text was updated successfully, but these errors were encountered: