-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
borg info: "This archive Original size" is incorrect #5408
Comments
Can you reproduce with a current borg 1.1.x release (like 1.1.14)? You can use either the binary from github releases or get it from debian backports. |
The archive stats include the metadata stored by Borg, not just the file contents. du only accounts for the disk space used by the contents. |
I thought it might be a metadata issue, but then from what I understand from those previous tickets the goal was to make "this archive original size" equal to the size of the files being backed up, perhaps to avoid questions like mine :). I believe that the previous issue was due to part files, so it would be interesting if meta data isn't excluded too (also that the meta data is 10GB on 200GB of files). That said, my du math might be incorrect so if there's a way I should measure the actual size of the files as borg sees them, let me know. Judging by how the compressed size is more in line with what I expect, I can't help but think something sparse is being picked up. Here's the output with the latest release:
|
Do you see a difference if you add |
BTW, I remember doing some fixes for stats stuff like this. IIRC, some were too big for 1.1.x and just went into master. Not sure if this issue was also fixed, though. |
Unfortunately What I thought was the relevant fix appears in the changelog for 1.1.9 (info: consider part files for "This archive" stats, #3522). What about deleted files - would they appear in the count for Original size? |
How many files are in this archive? |
Issue #3522 (about "this archive") was fixed in master branch by #4286 and in 1.1-maint branch by #4326 . Issue #4329 (about "all archives") was fixed only in master branch by #4515 . No 1.1-maint backport as that depends on new borg 1.2 archive metadata. master branch will get released at some point in the future as borg 1.2. |
du counts allocated disk space, for example, it says a 4-byte file uses 4K bytes:
In contrast, most backup programs add up the file sizes shown by ls, from the stat() syscall. If you are backing up a large number of small files, this extra padding to the next 4K boundary will make du usage much higher, but it also reflects the reality of 4K allocations. So you are comparing a disk-block-padded size from du with a borg size that I'm guessing is not padded. |
@enkore 219284 files @hashbackup but then shouldn't the borg backup report a /smaller/ size? @ThomasWaldmann does that mean the changelog for 1.1.9 is incorrect? If so we can close this issue. |
Tried with borg 1.1.14+ - the observed inconsistency could be due to a sparse file, look at this:
Notable:
|
Notable:
|
@sshaikh see above. changelog is correct, see also my updated comment above. |
@sshaikh sparse files can be anywhere where bigger runs of zeros are efficiently written to a file. Often seen with (VM) disk images. |
@sshaikh I think it depends on your actual file sizes, how compressible they are, and how much metadata is stored. For example, if you have 1M files containing 4K uncompressible data each, du will report 4GB but borg will report a higher number if metadata is included. If you are backing up larger files, the padding factor in du will be less important because it is limited to 4k per file, or 2K on average. So with larger files, the number du reports will be closer to the sum of ls, which I'm guessing is what borg uses. But then you have to add the metadata size, which would make the borg size larger. |
IIRC borg creates something like 150-200 bytes of metadata per "small file" (including the full path - long paths need more space), so for just 200k files that overhead would not explain a difference this large. I think Thomas is right here, there's probably a sparse file around. du(1) can consider either actual file size or disk usage. By default it is in "disk blocks mode", but it has an --apparent-size switch (which is triggered by the --bytes option as well). |
I tried the approach here: https://www.thegeekdiary.com/how-to-find-all-the-sparse-file-in-linux/ and found no sparse files in the directories being backed up.
It's curious. I may try a new backup just to see what happens. |
You could also use Then use some script or spreadsheet program to sum up the sizes and compare to original size for "this archive". |
That was a useful tip, thanks! After an afternoon of analysis I can confirm that It turns out that, embarrassingly, this is the age old gibi vs giga issue. Sticking to bytes, my total is 2.04361E+11. According to google this equates to 204 gigabytes and... 190 gibibytes. So the "issue" is with the units du -h presents by default. Using The moral of the story - always resort to bytes when comparing aggregate sizes. Ironically this issue was distracting me from moving to using json output which I believe uses bytes by default, so I would have gotten there if I had just let it go sooner ;). |
Hehe, sometimes it is easier as one thinks. :-) |
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes, including those in #4654 (comment), I believe I'm running a version of borg that has the fixes referred to in those.
Is this a BUG / ISSUE report or a QUESTION?
A bug
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg 1.1.11
Operating system (distribution) and version.
Debian 5.7.10
Hardware / network configuration, and filesystems used.
Running on OMV, ext4 on LUKS on LVM
How much data is handled by borg?
~195GB according to du (and duplicacy)
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg info repo::version
Describe the problem you're observing.
"This archive original size" is 204.33GB. "This archive Compressed size" is 194.98GB which is closer to what I expect (although would have hoped that was < real size). The total deduplicated archive size (for around 120 backups) is 190GB which is about right.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yes, by running info again.
The text was updated successfully, but these errors were encountered: