-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The thinner might delete snapshots that will be used as the source snapshot #84
Comments
This is indeed a serious issue and should not happen. zfs-autobackup should never destroy source snapshots that the target still wants. Even if --keep-source=1. Which version are you using? |
version |
I've confirmed the bug: It only happens if you have a bunch of unsend snapshots that still need to be send (e.g. used --no-send or snapshot-mode). I'll figure out why this one slipped by and fix it immediately. |
Ah its a bit more obscure to trigger, it only happens when:
So that explains why i've never seen it until now. |
It seems i fixed this issue:
I'm not sure about the first problem you've had. Did you make snapshots another way, by running in snapshot-only mode? |
note: It could also had been triggered by destroying the targetdataset and letting zfs-autobackup do a full resync. |
I just released rc5 for you to try. Perhaps you can do some tests to make sure there isnt a second issue here. I also added a regression test to prevent this from ever happening again. |
Thanks, I'll update to rc5 and revert the --keep-source back to 5 so that it's back into the "problematic situation" and see if it resolves itself. I'll report back tomorrow on that. About the "first problem": I did not use any other ways to create the snapshots, to be more precise about what happened;
Unfortunately I don't have the logs anymore of the first occurrence so I don't have the exact details of what was thinned, when and where, but it's entirely possible it was another very obscure edge case, perhaps the five snapshots in --keep-source happened to fall just between two "weekly snapshots" in the default thinner schedule or something. |
Then i think that issue was triggered by the same bug: If there is no common snapshot yet/anymore, zfs-autobackup made a wrong assumption and cleaned up too much. thanks! |
I can confirm the fix works, rc5 doesn't destroy the snapshots too early:
although I do get the "cannot hold snapshot" error print, but I believe that's expected in this case, and it doesn't trigger a "failed" status as I do get:
in the end, so it's all good now if that's the expected behavior |
Thanks! Yes the "cannot hold" is correct: There is already a hold on that snapshot. The reason we dont check if a hold already exists, is performance reasons. (we dont want the extra checks and zfs-calls for this edge case) |
In the output I get:
Note how the first of the two destroyed snapshots is the one it tries to use in zfs send.
In my nightly backup script I have the following:
This is not the first time I've encountered this, the first time was when backup for a dataset had failed for many days, leading to new snapshots (and thinnings) on the source with no successful transfers (cause was something somehow modifying one specific dataset on the target, unrelated to this, but fixed by mounting the target datasets as read only) -> thinner ended up thinning away all mutual snapshots on this dataset, which was bad.
To fix this I had to destroy/rename the target dataset to cause a full send to fix this, which ended up failing in the same reason; thinner deleted the oldest snapshot and then zfs-autobackup tried to do a full on the just-deleted snapshot -> to resolve this I had to run zfs-autobackup once with a bigger --keep-source.
The second time running into this was this time now when I had swapped the backupper storage medium for a new one, expecting zfs-autobackup to do full sends (which it tried to do, with the just-deleted snapshots), which was essentially the same scenario as the previous one in the end. Once again the workaround should be running it once with a bigger --keep-source which I've changed to my backup script which runs again next night. I'll update tomorrow if that fails for some reason.
(Before someone comments about it; the first time running into this and not noticing failing backups for a week was when zfs-autobackup was being evaluated as a new backup solution to replace my previous one, and thus had no monitoring for, so the issue went unnoticed for so long. With zfs-autobackup now in production and proper monitoring now in place this wouldn't go unnoticed, but that's not a fix for the issue)
The text was updated successfully, but these errors were encountered: