Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: call fsync before rename #1134

Merged
merged 8 commits into from
Aug 3, 2024

Conversation

LaurentvdBos
Copy link
Contributor

@LaurentvdBos LaurentvdBos commented Jul 27, 2024

When writing database.db to disk, we should call fsync on the file descriptor used for writing. See

https://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html

Which explicitly states that "The fsync() function shall request that all data for the open file descriptor ..." (emphasis mine)

I also enforced that database.db.tmp is not created if it already exists, in which case an Exception is thrown. That should be an indication that something went wrong in the past, but maybe this may be annoying. Should this behavior be properly handled in a try/catch and a nice logging message?

According to the POSIX spec, one needs to call fsync on the file
descriptor used for writing, see:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html

Moreover, fail if database.db.tmp exists, since this is an indication
something related to database.db writing failed in the past and we did
not notice.
@LaurentvdBos LaurentvdBos changed the title Fix/fsync before rename fix: call fsync before rename Jul 27, 2024
@LaurentvdBos
Copy link
Contributor Author

LaurentvdBos commented Jul 27, 2024

NB, this should address issue koenk/zigbee2mqtt#11759, or well, at least improve the situation.

@Koenkk
Copy link
Owner

Koenkk commented Jul 27, 2024

If I understand correctly, the current code doesn't do anything because a fsync is called on a file descriptor where nothing was written to?

Should this behavior be properly handled in a try/catch and a nice logging message?

Yes, otherwise we cannot get out of the situation without manual intervention. I propose to, before opening the fd, do a check whether the tmp db file exists, if it does, postfix it with the unix timestamp and log a warning message.

@LaurentvdBos
Copy link
Contributor Author

If I understand correctly, the current code doesn't do anything because a fsync is called on a file descriptor where nothing was written to?

Indeed, that can happen in theory. File systems add many checks to make sure the data does hit the disk in time, but especially when running on SD cards stuff may get cached aggressively before it appears on the disk (Raspberry PI users, for example).

Yes, otherwise we cannot get out of the situation without manual intervention. I propose to, before opening the fd, do a check whether the tmp db file exists, if it does, postfix it with the unix timestamp and log a warning message.

Good point, so there is nothing else that catches the existence of a database.db.tmp. I added a slightly adapted behavior: if database.tmp already exists, rename it to database.tmp. and warn. In that case, only one warning is emitted.

I broke CI since this adds an untested branch. Let me know whether you like the new behavior, because then I will add a test for it.

@Koenkk
Copy link
Owner

Koenkk commented Jul 28, 2024

Let me know whether you like the new behavior, because then I will add a test for it.

Looks good!

@Koenkk Koenkk merged commit 1c9190a into Koenkk:master Aug 3, 2024
1 check passed
@Koenkk
Copy link
Owner

Koenkk commented Aug 3, 2024

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants