-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
borg info not ignoring part files? #3522
Comments
Additinal note 1. There was a compacting process in the middle of the process. |
Suspicion: count/space accounting seems incorrect because part files are not ignored. I need some more info:
|
Sorry about the missing information, I was in a hurry and wrote it too fast.
borg 1.1.4, Arch Linux
Can't since the archive was recreated
253821
Can't since the archive was recreated
253823 It seems indeed to be the part files. But why wasn't they reported for the old archive. Is it because it was created with an older borg version? |
The part files get created by the checkpointing mechanism to support in-file checkpoints. |
But it is the same archive, but after recreation. Can a recreate generate new checkpoints? In the original archive (the one before pruning the files, which I still have as this one is just for testing) there are no part files. But after the "recreate" processes, this is what I have:
There I have the full file and two part files, which I guess should have been removed, shouldn't they? |
The new archive is created running same code as for So yes, it will create new checkpoints (new part files) according to If the original archive has part files, they will get dropped as usual (when not using So the part files you are seeing are NEW part files, made when checkpointing the new archive. |
But, shouldn't part files be removed if the archive is successfully created? |
The archive's item metadata stream is append-only. Once an item metadata entry is written to it, it can not be easily remove from it again (except with We need to create the part file to contain the chunks of the partial file at checkpoint time. After the checkpoint, more part file(s) will be created and finally the full file. After the full file item is written, the part files are not needed anymore, but we can't undo them - thus they are just ignored. Except for borg info, it looks like it is not ignoring the part files when computing count and size, this is likely the bug here. |
It is clear now. Thanks! |
Hence a workaround for me here would be setting a really long checkpoint interval (eg. days) during recreation, to make sure no checkpoint is created. |
@enkore looks like an issue in cache_sync. it does not ignore part files. |
I don't think it ever ignored them. If it did, wouldn't that cause the reference counts to be too low? |
The usual python code to iterate over archive items ignores part files by default, except when |
But wouldn't the stats be wrong if you ignored the part files? The references and the data are there. |
Well, there are 2 views:
The latter is important to not confuse users with stats inconsistencies caused by such implementation details. |
I think either way is inconsistent, but in different ways:
In a way these files made the metadata somewhat self-contradictory; in a perfect world they would not have been necessary, but I really wouldn't want to see how messy it would be to achieve this in the borg code base. |
Isn't "how much space the archive uses" the same for both? the part files completely dedup with the final full file. (assuming that there actually is a final full file) There's also files count obviously showing unexpected higher numbers, see the bug report. |
Ah right. So it's only a question of original/compressed size and nfiles then? If one wanted to implement this, then it shouldn't be too difficult. One would want to introduce a second state in expect_chunks_map_key to look for the is_partfile flag (or how it's called exactly). A complication is of course that msgpack is unordered, so the chunks could come before that flag. If one only wants to modify counting of nfiles, no big deal, but if all stats should be changed may require bigger changes. On Python 3.6 this may appear to work when its not (due to implicit dict ordering and msgpack-python implementation details). Looking at e189a4d I don't think anything was missed when making the transition, since I don't see any partfile filtering in the old code. |
The "This Archive" stats are fixed by PR #4286. The "All Archives" stats can not easily be fixed to not consider part files, due to the way they are computed. |
Hmm, I just noticed that by implementing #3241, the "All archives" stats values for original and compressed columns could be computed by summing up the stats values from all archive headers, if we have them in all archives (not: old archives, but maybe these could get updated by borg recreate). Only the deduplicated size would come from hashindex stats. |
Reopening it as the "All archives" fix is still needed. |
Closing this to claim the bounty. It only fixes "This archive" as noted above, but considering the bounty isn't much either, I consider it to be ok. I'll reopen a new issue for the remaining fix needed for "All archives". #4329 |
Yes, I know recreate is EXPERIMENTAL. But you need feedback, don't you? :)
Since zstd is now available, I wanted to make some tests over a copy of my repository. In particular, I wanted to see the impact on 1) Size 2) Extraction speed.
Hence, I was disposed to re-compress the eldest archive of my repository:
borg info .::2017-03-31_09:01:06
Now, recompress.
borg recreate .::2017-03-31_09:01:06 --recompress -C zstd -p -v
And then check the result.
borg info .::2017-03-31_09:01:06
I fail to understand how there can be 2 more files and 5 more GB. Note total deduplicated size decreased 2GB as well.
bounty: https://www.bountysource.com/issues/53581808-borg-info-not-ignoring-part-files
The text was updated successfully, but these errors were encountered: