-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corruption of persistent database file cause by sudden lost of power #189
Comments
Thanks for the report, I've added code to call fsync() on the file and on its directory. Could you please confirm whether this fixes the problem for you? |
Recommend calling fflush before fsync to ensure that application buffers are completely flushed to kernel buffer before being flushed to disk. It's awful hard to reproduce this, but between thanhvtruong and myself we started a long-term test series to mechanically verify. I do not believe the directory sync is required; the rename logic should work as-is. I opened a pull requrest for these changes and will update as the long-term tests progress. |
Mosquitto database writes are not atomic and if power is lost during a write the file will be permanently lost. This commit makes writes as atomic as possible. Signed-off-by: Keegan Callin <kc@kcallin.net> Bug: eclipse-mosquitto#189
Mosquitto database writes are not atomic and if power is lost during a write the file will be permanently lost. This commit makes writes as atomic as possible. Signed-off-by: Keegan Callin <kc@kcallin.net> Bug: #189
Thanks very much for your work on this, I'm closing this now based on your pull request. |
It looks like when the persistent file is being save there is a tiny amount of time where a sudden power lost will cause the persistent database file to be corrupted.
Thousands of sudden power lost on our system were performed we notice the following:
We have a similar problem with another application that we build and reading into Linux documentation, we found out that flushing or closing a file is not enough to write the content to disk. An fsync need to be perform to confirm that the content is written to disk.
You can see the documentation in the man page (man close, under "note" second paragraph) on Fedora 23.
"A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored, use fsync(2). (It will depend on the disk hardware at this point.)"
The text was updated successfully, but these errors were encountered: