-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create WAL error: fileutil: file already locked on quick restart #1421
Comments
It's really strange that Do you think it's possible that two goroutines are calling |
@aaronlehmann https://gist.github.com/tonistiigi/f3e7496e1b8523174d0729080705ea38#file-daemon-log-L166-L184 Added logs for manager starting and error. Both seem to appear only 2 times and second one after the first has finished. |
@aaronlehmann Added logs around the file locks https://gist.github.com/tonistiigi/607c9647f8c50264410ea6bc25ab6c3c . Can't say that the result makes much sense. |
I finally have an explanation for this that at least makes sense. A file is locked if any process has it locked. These locks follow forks. Go's
The only solution I can think of is adding some retry logic. This logic probably belongs in |
Another fix would be to patch upstream |
@tonistiigi: Can you try this patch? aaronlehmann/etcd@5a39edf |
@aaronlehmann Yes, that patch seems to fix it for me. |
fixed by #1448 |
TestSwarmInit
sometimes blocks after latest swarmkit vendoring moby/moby#25833 (comment) . The test creates a single node cluster, stops, removes state(docker swarm leave) and creates a new one. In the second run it seems that manager fails with an error and blocks because of a related issue #1283 (comment)).I tried to add debugging and it seems that the manager is failing with error
time="2016-08-19T23:46:02.628112210Z" level=debug msg="m.Run=can't initialize raft node: create WAL error: fileutil: file already locked"
.I bisected it to #1376 but that PR just seems to change the timing between stop and start. I can also make the test pass when I add a small sleep to test(reversing the speedup of key change). Even with the key change I can't reproduce with
v1.12.1
so the timing seems to be very important or something changed in last week(#1369 maybe).cc @aaronlehmann
The text was updated successfully, but these errors were encountered: