Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try / document borg + ntfsclone #81

Closed
ThomasWaldmann opened this issue Jul 5, 2015 · 15 comments · Fixed by #3292
Closed

try / document borg + ntfsclone #81

ThomasWaldmann opened this issue Jul 5, 2015 · 15 comments · Fixed by #3292

Comments

@ThomasWaldmann
Copy link
Member

could be useful to clone windows systems (with ntfs filesystems)?
due to ntfsclone, would only save allocated blocks.

try it and document your results.

@ThomasWaldmann
Copy link
Member Author

I tried it with system-rescuecd (includes borg, ntfsclone and other useful stuff), a MBR partitioned disk, windows on 2 partitions:

Please verify the commands before you use them, especially the disk identifiers!

# backup
sfdisk -d /dev/sdx > sfdisk.txt
dd if=/dev/sdx of=mbr_gap count=2048  # all sectors until 1st partition, see sfdisk output
borg create --compression lz4 repo::hostname-partinfo-mbr-gap sfdisk.txt mbr_gap
ntfsclone -s -o - /dev/sdx1 | borg create --compression lz4 repo::hostname-part1 -
ntfsclone -s -o - /dev/sdx2 | borg create --compression lz4 repo::hostname-part2 -
# restore
borg extract repo::hostname-partinfo-mbr-gap
sfdisk /dev/sdx < sfdisk.txt      # this is a bit redundant, but notifies the OS
dd if=mbr_gap of=/dev/sdx bs=1M
borg extract --stdout repo::hostname-part1 | ntfsclone -r -O /dev/sdx1 -
borg extract --stdout repo::hostname-part2 | ntfsclone -r -O /dev/sdx2 -

Note: mbr_gap contains the MBR (1 sector) plus the gap after it (which may be used or not).

@ThomasWaldmann
Copy link
Member Author

TODO: verify it, check dedup, maybe optimize chunker params, create nice rst docs.

@HomerSlated
Copy link

HomerSlated commented Nov 20, 2016

I just tried this and it works great.

585GB ntfsclone image, piped to borg, LZ4 compressed down to 475GB, then deduplicated down to 445GB.

Took about 12 hours.

@ThomasWaldmann
Copy link
Member Author

@HomerSlated I had the impression that at least the first run is quite a lot slower than just writing it to disk. Maybe it gets better if one saves another version of that image to same repo.

@HomerSlated
Copy link

I'll test this when I make my first incremental, in a few days time.

However, I believe this is expected behaviour from any deduplication system. Borg is very far from being the slowest. I'm currently doing an initial CrashPlan backup on exactly the same dataset, except half is excluded, and CrashPlan is predicting -> 3 DAYS <- for completion. That's 3 days to backup 250GB, vs Borg backing up the full 500GB in only 12 hours. I haven't tested OpenDedup yet, but I've heard that its even slower than CrashPlan.

Also, CrashPlan's size estimate would suggest, bizarrely, that its final archive will actually be bigger than the input data, whereas Borg's was markedly smaller even on the initial run.

I'll let you know the final results when they're ready.

@HomerSlated
Copy link

HomerSlated commented Dec 9, 2016

Here's the results.

In summary, a full backup of ~550GB took ~10 hours, and the first incremental (three weeks later) took ~4 hours, deduplicating down to ~20GB.

# On 2016-11-20
Name: Winders-20161119-230121.nfc
Fingerprint: 371062a4f5f0399485889d08bec4ab7ae037989a7ea565f92cdef98cfcfbe7f8
Hostname: lexy
Username: root
Time (start): Sat, 2016-11-19 23:01:22
Time (end):   Sun, 2016-11-20 08:44:01
Command line: /usr/lib/python-exec/python3.4/borg create --progress --stats --compression lz4 ::Winders-20161119-230121.nfc -
Number of files: 1

                       Original size      Compressed size    Deduplicated size
This archive:              534.74 GB            473.29 GB            451.56 GB
All archives:              546.89 GB            482.92 GB            460.50 GB

                       Unique chunks         Total chunks
Chunk index:                  538671               591406

# On 2016-12-09
Name: Winders-20161209-085926.nfc
Fingerprint: c8b91c276ca6834788b04205d4b37872cd0a8f9feb08614d25774b2154db5afc
Hostname: lexy
Username: root
Time (start): Fri, 2016-12-09 08:59:26
Time (end):   Fri, 2016-12-09 12:58:08
Command line: /usr/lib/python-exec/python3.4/borg create --progress --stats --compression lz4 ::Winders-20161209-085926.nfc -
Number of files: 1

                       Original size      Compressed size    Deduplicated size
This archive:              549.57 GB            475.01 GB             19.63 GB
All archives:                1.10 TB            957.93 GB            480.13 GB

                       Unique chunks         Total chunks
Chunk index:                  551119               798954

# On 2016-12-09
Name: Winders-20161119-230121.nfc
Fingerprint: 371062a4f5f0399485889d08bec4ab7ae037989a7ea565f92cdef98cfcfbe7f8
Hostname: lexy
Username: root
Time (start): Sat, 2016-11-19 23:01:22
Time (end):   Sun, 2016-11-20 08:44:01
Command line: /usr/lib/python-exec/python3.4/borg create --progress --stats --compression lz4 ::Winders-20161119-230121.nfc -
Number of files: 1

                       Original size      Compressed size    Deduplicated size
This archive:              534.74 GB            473.29 GB             16.51 GB
All archives:                1.10 TB            957.93 GB            480.13 GB

                       Unique chunks         Total chunks
Chunk index:                  551119               798954

I'm also testing UrBackup with its CBT (Changed Block Tracker) on the same dataset. So far it's predicting ~7 hours to back up (which seems wrong, unless the CBT isn't working properly), although I don't yet know the final size of the differential.

I'm also planning to test burp2 at some point, for comparison.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Dec 9, 2016

Hint: using more recent python (e.g. 3.5, if possible) may give better speed.

The borg binary bundles latest python release (3.5.2 currently).

@ThomasWaldmann
Copy link
Member Author

@HomerSlated keep us updated about performance and other results of your comparison (I wanted to create and set up something to compare backup tools performance, but didn't come to it yet).

@HomerSlated
Copy link

Yes, a defined testing environment for controlled results would be useful. The problem is that at least one of those tests would require running from a different OS (Windows), thus breaking the control conditions (UrBackup imaging and CBT currently only works under Windows).

Nonetheless, a "real-world" speed and size comparison is useful.

@HomerSlated
Copy link

HomerSlated commented Dec 9, 2016

Also Python 3.5 is currently masked unstable on Gentoo, and will cause all kinds of dependency issues if I emerge it, so I'll leave it for now.

[I] dev-lang/python
     Available versions:
     (2.7)  2.7.10-r1 2.7.12
     (3.4)  3.4.3-r1 3.4.5(3.4/3.4m)
     (3.5)  ~3.5.2(3.5/3.5m)
       {-berkdb build doc examples gdbm hardened ipv6 libressl +ncurses +readline sqlite +ssl +threads tk +wide-unicode wininst +xml ELIBC="uclibc"}
     Installed versions:  2.7.12(2.7)(13:54:00 04/12/16)(gdbm ipv6 ncurses readline ssl threads wide-unicode xml -berkdb -build -doc -examples -hardened -libressl -sqlite -tk -wininst ELIBC="-uclibc") 3.4.5(3.4)(14:02:59 04/12/16)(gdbm ipv6 ncurses readline ssl threads xml -build -examples -hardened -libressl -sqlite -tk -wininst ELIBC="-uclibc")
     Homepage:            http://www.python.org/
     Description:         An interpreted, interactive, object-oriented programming language

* app-backup/borgbackup
     Available versions:  ~1.0.7 ~1.0.8 **9999 {+fuse libressl PYTHON_TARGETS="python3_4 python3_5"}
     Homepage:            https://borgbackup.readthedocs.io/
     Description:         Deduplicating backup program with compression and authenticated encryption.

@HomerSlated
Copy link

Well it turned out that UrBackup took 222 minutes (3 hours, 42 mins) to complete, with an incremental size of just 4.94GB, and that's without CBT functioning correctly (would have reduced the time to maybe 5 minutes).

@enkore
Copy link
Contributor

enkore commented Dec 9, 2016

15 MB/s is indeed rather slow (even for Borg 1.0). UrBackup's 40 MB/s is a lot better, but still kinda slow. I'm not sure how UrBackup works, it seems to be a classic full+delta backup system, so it probably uses much less CPU - likely the limiting factor in your case?

CBT is indeed an interesting technology. It would seem to enable a similar behaviour as Borg's file metadata cache (which allows Borg to instantaneously backup an unchanged file, regardless of size), just for images, and not file systems -- quite interesting!

@HomerSlated
Copy link

HomerSlated commented Dec 10, 2016

I should add that the backup storage is on USB 3.0, which on my hardware benches at ~200 MB/s seq r/w, although clearly that's more than 40 MB/s, so that's probably not the bottleneck.

Actually I'm not really bothered about the speed (except when the CBT client I paid for doesn't seem to work). I'm more interested in saving storage space, and both borg and UrBackup work well in that regard, although the latter seems to have the edge right now (in my tests so far).

I just tried burp 2.x, but can't figure out how to get it to work. It seems I need to RTFM on OpenSSL certificates before I can even get started.

@prnrrgxf
Copy link

@ThomasWaldmann
You told in the 2 hours long borg video from 2017 that the users should write to the end of a bug in the bugtracker when they need it. I do this right now.
I need exactly that. I have to backup a windows machine with a 2TB hdd where only 200GB are used. At the moment i boot up a live linux, connect a second 2TB hdd to the computer and run dd to get everything (mbr, just all).
This is huge waste of time and resources. Everytime a full 2TB backup of ~90% no-data.
Could you please make a production ready borg solution for that? That would be awesome!

@dragetd
Copy link
Contributor

dragetd commented Oct 10, 2017

Please note: If your deleted sectors contain random old data, deduplication will not be able to deduplicate everything and it will be still slow and large. 'dd' is not the best choice here.

ntfsclone is designed to only copy the used sectors. Even without borg, you should benefit form using ntfsclone. Using borg+ntfsclone also seems to work without issues, giving even smaller backups. (just the discussed performance issues are left to be investigated)

milkey-mouse added a commit to milkey-mouse/borg that referenced this issue Nov 6, 2017
milkey-mouse added a commit to milkey-mouse/borg that referenced this issue Nov 7, 2017
milkey-mouse added a commit to milkey-mouse/borg that referenced this issue Nov 7, 2017
milkey-mouse added a commit to milkey-mouse/borg that referenced this issue Nov 7, 2017
milkey-mouse added a commit to milkey-mouse/borg that referenced this issue Nov 10, 2017
milkey-mouse added a commit to milkey-mouse/borg that referenced this issue Nov 14, 2017
milkey-mouse added a commit to milkey-mouse/borg that referenced this issue Nov 23, 2017
@ghost ghost mentioned this issue Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants