-
-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size, csize, dsize and dcsize appear to return incorrect values #3736
Comments
Oh ya, I am on CentOS 7 and borg 1.1.4. |
When mounting the backup:
|
See that other ticket about |
Looking around, seeing a few things with "borg info" but nothing that looks to be this. Do you have a case number? I did a restore on the archive and it wrote 100GBs of data. I think this is a case of the stats screen showing a different size that the actual size. So there are two issues here. One, the list and info screns do not match (info and progress don't match what are being backed up, so I am assuming the list data is correct). And two, dedup data is not being listed correctly. The list shows 0 bytes used for this particular disk (the main data drive). This VM has never been backed up before, so it is the only copy in the repo. I highly doubt I got a 100% perfect dedup. I found this because I was writing a script that would produce a detailed report of usage in the archive and the data is spat out was WAY off for dedup. Here is a copy, you may need to tweak it, but try running it on one of yours:
|
The dedup size likely looks strange because the chunks are not only referenced by the whole file, but also by the part files created by checkpointing, see also #3522. |
Using consider-part-files definitely helped! Now the original and compressed numbers line up perfectly if I use 1000 rather than 1024 for my math. No joy on the dedup numbers though.
Items still showing zero blocks:
|
So... I am noticing that borg extract with stdout is also impacted by "consider-part-files"! That seems odd, was that intended? |
About the dedup numbers: we only count the unique chunks (chunks with refcount == 1), but if checkpointing happens within a file, there will be a bunch of .part files adding second references to all file chunks. So the chunks are not unique any more and don't count into the dedup size. This is kind of wrong, but the current stats do not differentiate between in-same-backup-references and not-in-same-backup references. About stdout/part files: borg does not prevent the user from doing potential nonsense. So e.g. if you call
|
Alright, I think that clears everything up! I think it makes the list function less viable but… So, is that why the borg info command and backup status screen report more data than is actually being backed up? It is getting double counted due to the part files? Thanks for you help on this! It is greatly appreciated. I think for my purposes I will use the dedup value from the info screen, and the original size/compressed values from the list screen. That SHOULD give me an accurate representation of the archive. I will post the final version here is anyone is interested. |
shouldn't part files be left out of continuations ? |
@RonnyPfannschmidt not sure what you mean. |
That would be much cleaner, but even the current part files code is a big mess (and it even creates a user-visible mess). Clean continuations would mean to track and selectively rollback txn state. |
i guess this is a dupe of #3522 and that one was closed, so closing this one also. |
There appears to be an issue with the values returned from borg list, possibly for large files.
Lets pick a backup (names have been removed):
Alright, from about 214GBs to 10GBs.
If we list the items for that backup we get:
That doesn't look right! And if we do some math:
The numbers being about 50% less... Alright (something with it being a sparse file maybe), but the dedupped size isn't even close.
Is this a bug, or am I misunderstanding the results?
The text was updated successfully, but these errors were encountered: