Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg2: it's coming! #6602

Open
ThomasWaldmann opened this issue Apr 15, 2022 · 43 comments
Open

borg2: it's coming! #6602

ThomasWaldmann opened this issue Apr 15, 2022 · 43 comments
Assignees
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 15, 2022

update: as there was no negative feedback from alpha testing, borg2 branch was merged into master, thus that big change in form of a major / breaking borg 2.0 release is coming.

read below about what's planned and what's already done.


what could be done if we decide to make a breaking release (2.0) that:

  • does not try to be compatible with old repos
  • does not try to be 100% compatible with old cli syntax (but 90%)
  • only uses new repos / keys it created itself
  • only gets old archives via borg import-tar or borg transfer

putting all the breaking stuff into 1 release is good for users (1 time effort), but will take quite some time to test and release.

After borg 2.0, we'll make a N+1 release (2.1? 3.0?) that drops all the legacy stuff from the codebase, including the converter for borg < 2.0 repos.

borg 2.0 general comments

DONE: offer a borg transfer command, #6663, that transforms old stuff only to stuff that will still be supported by borg N+1.

N+1 general comments

much of the stuff described here has own tickets, see "breaking" label / add issue links here.

2.0 crypto

  • DONE repo-create: do not create old keys (pbkdf2, legacy AES class, encrypt-and-mac)
  • DONE repo-create: do not create AES-CTR based repos, only new AEAD ciphers with session keys
  • DONE: remove all docs talking about potential nonce reuse, counter management and related
  • DONE: remove key algorithm change (pbkdf2<->argon2), just use argon2 for new repos/key
  • DONE: nonce management code for aes-ctr, not needed any more with session keys, remove nonces module remove nonce management, related repo methods #7556
  • keep old crypto code, we need it to read / decrypt old repos

N+1 crypto

  • remove pbkdf2 + pbkdfs/sha256 keys + docs - we have argon2 now
  • remove low_level.AES class which is only used for pbkdf2 key encryption
  • remove aes-ctr mode
  • remove support for super-legacy passphrase-key type (not supported since long)
  • we used hmac-sha256 and blake2b as id-hashes in the past, thus we need to keep them because we need an efficient borg transfer (not needing to re-hash)

2.0 repo

  • DONE: implement new repository based on borgstore, use borgstore and other big changes #8332
  • DONE: implement sftp: borgstore backend (a remote backend that does not need borg serve on the remote).
  • DONE keep support for reading borg 1.x repos
  • DONE read-compatibility with old local repos for borg transfer, and/or
  • DONE read-compatibility with old RPC (ssh: repos) for borg transfer (in that case the old repo would be served by an old borg version)
  • DONE borg check only checks new repos, no support for old repos
  • DONE only generate latest hints format (this is done since long)
  • DONE remove detection of attic repos, remove remainders of attic legacy #6859
  • DONE remove free nonce / nonce reservation api

N+1 repo

  • remove support for reading borg 1.x repos
  • if we reduce MAX_OBJECT_SIZE from ~20MiB to 16.000.000, 24bit are enough for the entry length
    • better alignment for segment entries together with the 1 type byte.
    • in-memory indexes have 8 free bits without using more memory
    • max archive size goes down by 20%!
    • do not allow 16MiB objects, we need some room for potential header size increase in future
    • nope, we cannot do that: we need an efficient borg transfer, not needing to re-chunk content!

2.0 indexes / cache

  • DONE remove legacy_cleanup function
  • DONE no repo index needed anymore, objects are stored separately and directly access via their id
  • DONE no chunks index persisted/synchronized anymore, existing chunks are queried from the repo.

N+1 indexes / cache

  • remove legacy indexes / caches

2.0 msgpack

N+1 msgpack

2.0 archive / item

N+1 archive / item

  • Item.get_size: remove support for items with chunks, but without precomputed size

2.0 or N+1 checksums

  • DONE we could even consider removing libdeflate in 2.0. the only major user will be "borg transfer" and that will be a one-time per repo usage.
  • DONE remove libdeflate again and use zlib.crc32 from stdlib, PUT2 format only uses crc32 for header data, not much data getting crc'ed

2.0 compression

N+1 compression

  • drop support (dispatching / handler) for the zlib dirty type bytes hack (ZLIB_legacy)
  • we need to keep all other compression algorithms, because borg transfer did not recompress

2.0 upgrade

  • DONE remove borg upgrade, doing upgrades from attic / old borg (they need to first upgrade to 1.2 and then use borg transfer)

N+1 archiver

  • remove unneeded stuff from benchmark cpu

2.0 remote

2.0 cli

2.0 locking

  • DONE implement borgstore based locking for borgstore based repos.
  • DONE stale lock removal of old locks that did not get refreshed.
  • DONE stale lock removal of locks of dead processes.
  • DONE most commands now use a shared lock, except borg compact and borg check.

y2038 and requiring 64bit

  • if we set SUPPORT_32BIT_PLATFORMS = False, the y2038 issue will be solved (AFAIK), but we require a 64bit platform then.
  • not sure if we can already do that. a lot of platforms already dropped 32bit support, but for some this is still in the works (e.g. SBC like the raspberry pi).
  • otoh, development of borg 2.0 will take a while, so there's a good chance all 32bit platforms are gone when it will be released. and even if not, borg 1.2 will still exist also.

stuff that is out of scope

as you see above, there is already a huge scope of what should be done.

to not grow the scope even further, some stuff shall not be done (now):

  • no public key cryptography (neither by gpg nor by reinventing gpg for borg)
  • no multithreading
@elho
Copy link
Contributor

elho commented Apr 16, 2022

I do not mind breaking for the better at all, but some of the outlined details do not qualify for that IMHO.

When it comes to crypto, breakage should not occur to replace one algorithm with a limited life span with another one with a limited life span and thus planning with breakage every few years. Instead breakage should be done to end up with a repo format that does support multiple algorithms and easy and feasable changing of keys as well as used algorithms. That could e.g. be by at least temporarily allowing multiple algorithms to be "active" in a repo at the same time.

When it comes to repo format, a breakage should not be the excuse to just dump a bit of code to still support reading PUTs besides PUT2s, but question the format as a whole and try to address issues such as the current limitations of append-only as well as secure multi-client usage, infeasible (with huge repos) compaction.
Ideas here would be:

  • split into server managed (optionally hard add-only) content chunk "pool" and a per-client (separate crypto) meta-data stores (that could or could not be chunk based).
  • consider use of public-key crypto especially in such a split storage model, where clients do not have the secret key that would be needed to read back chunks from the content pool, but only the one for the meta-data store, with the meta-data being encrypted to both that one and one of the borg server for the meta-data, so that it can both use the meta-data for e.g. compaction of the content chunk pool and auditing of client behavior (limit access to content chunks to those the client does have "written" before). The devil is in the details, but again, just ideas.
  • evaluate radically repo formats, e.g. hierarchy of nested directories plus files named by chunk-id (prefix), leading in the extreme to a segment-less format where chunk existence is a single stat, chunk deletion a single rm of a single file and reading a chunk would be simply entirely reading a single file, or in less extreme cases (should practical testing reveal that filesystems would be a limiting factor) to segments that are not filled one after another, but collecting chunks with the same prefix. The extreme case would on the plus side lend itself perfectly to object storage backends, which a people keep asking for. (Both these could turn out to be terrible ideas, but they and other wild ideas should be considered and looked into to make sure a completely breaking repo format is one to be kept for a long time and worth the breakage)

When it comes to compression, what really should go is the auto mode - or be reimplemented with useful parameters, whcih IMO are hard to come up with in the light of ZSTD performance.

About "scp syntax":
On the one hand I think it does not matter much, any sane setup does have wrapper scripts around it to make you only ever see and use the repo URL once in the life of the repo.
On the other hand, given the use in scp/rysnc etc. making that non-URL syntax so much more common to users, plus that while the code handling things leaves a lot room for improvement, a lot of that has nothing to do with the non-URL syntax as such.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Apr 16, 2022

Crypto:

AES-CTR does not have a limited timespan. Why we are doing this is to get rid of the fundamental counter management issues:

  • you can't lose the local counter memory and not trust (and continue to use) the remote repo.
  • also you can't use multiple clients for one repo and not trust the repo.

There's also a slight ugliness of only storing a part of the IV within the old format, but that is just a minor detail.

The new AEAD algorithms with session keys solve that.

We could have all 3 crypto algorithms in parallel in the borg code (but currently not in same repo), but there are other things on the above list that are best solved with tar-export/import or borg transfer and a new repo and IF ones does that anyway, one can as well go for the better crypto in one go (instead of having to do the export/import again some time later).

I don't think it would be a good idea to use different encryption algorithms in the same repo and especially not with the same key - so if we would go for the complexity of supporting repos with that, we would need multiple (master) keys for one repo, making it more complex for borg and also for the users.

You also can't just "change the keys / algos" in the same repo. Due to dedup, a lot of data would be still encrypted by old key and old algorithm. To get really rid of it you'ld need some global migration, touching a lot of data and needing some management for the case of interruptions of that process. That's about as much I/O and time needed as the export/import, just with much more complexity.

@ThomasWaldmann
Copy link
Member Author

Repository:

It's not just about the "reading PUTs" - it is at quite some places, including borg check (which is already quite complex).

I can imagine doing some more and even radical changes to the repo format if we re-start with new repos and require export/import anyway. I am not too happy with the complexities of segment file handling either.

In the end this will depend on some developers architecting and implementing it though and we should try to not make the scope too big though or it'll never get releasable.

Repos: interesting ideas. Needs more analysis I guess, esp. since we likely want to keep the transactional behaviour and maybe also the LOG like behaviour.

Segmentless repos: if everybody had a great repo filesystem and enough storage, I guess that could be done (but it would mean that if the source has a million files, the repo could have XX million chunks). Super simple for borg, but a huge load on the repo fs (did that within my zborg experiment back then). Could also be quite slower due to more random accesses and more file opening and use a lot more space due to fs allocation overheads if one has a significant amount of small files.

Cloud storage: I don't want to maintain such code myself, that's just a rabbit hole I don't want to get into. So, for me it is "local directory" as the repo (plus some method of remoting that, not necessarily the hard to debug current remote.py code).

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Apr 16, 2022

Compression: auto mode should go? do we have a ticket about that?

@ThomasWaldmann
Copy link
Member Author

@elho thanks for the detailled feedback btw!

This ticket is primarily meant for the to-break-or-not-to-break decision. Once we decide to do a breaking release, requiring new repos, key, export/import, we can do a lot of changes and need to discuss the details in more specific tickets.

We should somehow try to limit the scope though, so it won't take forever.

@RonnyPfannschmidt
Copy link
Contributor

@ThomasWaldmann if instead of segments something like git pack's could be used, then with the new encryption session stuff it may even turn feasible to push packs instead of archives between repos without necessarily requiring de/encryption

@RonnyPfannschmidt
Copy link
Contributor

Potentially this would also enable potentially dumb remotes like s3, sshfs, with the caveat of having more pain with post prune gc and repacking

@ThomasWaldmann
Copy link
Member Author

@RonnyPfannschmidt encrypted chunks can be transferred between related repos using the same key material, there is a ticket about that already. I don't know the git pack format, so not sure how that is relevant for (re-)encrypting. But if we want to transfer a full "pack", there might be requirements due to that (opposed to just transferring a single chunk).

@elho
Copy link
Contributor

elho commented Apr 16, 2022

I would be happy with a borg1.3 that on first use of serve on (or direct local access to) a v1 repo would start out (maybe after some confirmation) by iterating over all segments, for each creating a new replacement segment file, filling it with the same content except for using PUT2 whenever a PUT is read from the old one, doing some sort of verify pass maknig sure the new segment as arrived on disk has the same data as the old one and only then atomically mv the new over the old one. When having done the last segment file without being interrupted, switch repo version from v1 to v2.
No other command or code path would need to support v1 and PUT in that scenario.

@ThomasWaldmann
Copy link
Member Author

Note: I updated the topmost post with feedback from you all (thanks!) and also with new insights. I also edited some other posts to remove duplicate / outdated information to keep this issue short.

@borgbackup borgbackup deleted a comment from RonnyPfannschmidt May 3, 2022
@ThomasWaldmann ThomasWaldmann self-assigned this May 6, 2022
@ThomasWaldmann ThomasWaldmann changed the title thought experiment: breaking with the past breaking with the past May 6, 2022
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented May 6, 2022

Progress in #6663 and #6668 looks quite good.

About version: if we require people to transfer their repos using borg transfer, guess that must be borg 2.0 because you can't just continue with an existing repo as it is.

So, if we merge these, next release from master will not be 1.3, but 2.0.

@ThomasWaldmann ThomasWaldmann changed the title breaking with the past borg 2.0: breaking with the past May 6, 2022
@horazont
Copy link
Contributor

horazont commented May 6, 2022

not sure if we can already do that. a lot of platforms already dropped 32bit support, but for some this is still in the works (e.g. SBC like the raspberry pi).

I think especially SBCs will stay 32bit for a while, because the savings in having a smaller pointer width are relevant on low-memory platforms.

Aren't there clock system calls which return a 64-bit wide integer even on 32-bit ABIs?

@ThomasWaldmann
Copy link
Member Author

Well, it's not just like borg needs to get the 64bit time by doing a call, it rather is the whole system of kernel / libc / python needing to work with timestamps of reasonable length. E.g. timestamps in os.stat output, python time and datetime stuff, etc.

@elho
Copy link
Contributor

elho commented May 7, 2022

So, if we merge these, next release from master will not be 1.3, but 2.0.

Changing the module name from borg to borg2 at this point is something to be thoroughly considered.

Both, to eventually play with potential (meanwhile obsoleted already) export/import tar magic, but also to be able to test 1.2 in parallel with 1.1 in production across all my systems in a sane manner, I went on the surprisingly painful adventure to create myself a variant of the distribution's package that can be installed and used in parallel with the stock 1.1 one. In a hackish manner, one could install borg below a different path, but that is nothing any distribution would do, I went the painful way to do such a rename in there.
(IOW happy to clean that up and even break out some of the cases where absolute imports were used without need and against the common practice in most other similar places in the code).

For the original idea of export-import migration this would be a requirement, here it is not, but in practice, for people backing up to multiple repos, scenarios like migrating the local one to 2.0 while still waiting an undefined time for the borg storage provider the external one resides on to support 2.0 could be very common.

@ThomasWaldmann
Copy link
Member Author

Guess it is not just about the module name, but also the cli cmd name. OTOH, I'ld dislike to put the version number into the cli cmd name.

For testing, one could also use the fat binary and rename that to borg2.

@elho
Copy link
Contributor

elho commented May 7, 2022

Guess it is not just about the module name, but also the cli cmd name. OTOH, I'ld dislike to put the version number into the cli cmd name.

It is, but the command name is something that can just be changed without requiring any modification of the command itself to keep it working, and on the other hand is something distributions have support for.
E.g. in Debian, a borg2 package would ship borg2 etc. comnands, but (along with a packaging update to the 1.x version to be shipped in parallel) make use of the alternatives system of managed symlinks to have borg commands available to the user that point to whichever version is installed on its own, to (probably best for compatibility) borg1 if both are installed, with the option for the user to easy switch that (along with the corresponding manpages) according to his preference.

Aware wrappers that censequently have an idea of the configured repo(s) being version 1 or 2 would know to invoke according versioned command name in all cases.

For testing, one could also use the fat binary and rename that to borg2.

Testing as in "is this for me" or "does this work at all", yes. But not for testing as in "let me run this in parallel to 1.1 for a couple months and see whether any issues arise before ditching 1.1", ie. a point where 1.2 can be regarded to be at currently.

@horazont
Copy link
Contributor

horazont commented May 7, 2022

Well, it's not just like borg needs to get the 64bit time by doing a call, it rather is the whole system of kernel / libc / python needing to work with timestamps of reasonable length. E.g. timestamps in os.stat output, python time and datetime stuff, etc.

The statx syscall already has 64-bit wide timestamps (it uses __s64 for the seconds instead of time_t). Since kernel 5.1, 64-bit wide time structs are available on a bunch of other system calls.

So the kernel can (probably; I saw patches for utimes64, not sure if those have been applied, it hasn't been mentioned in that post above) do it.

I'm not sure what the current status is on the glibc side of things (the page looks a bit unclear on progress), but it may be worth pushing python on 32bit architectures to use it if glibc is ready.

All I'm saying: don't drop support for 32-bit architectures, but go for dropping support for 32-bit timestamps, which don't have to be the same thing anymore this time and age.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jul 4, 2022

Note: i updated the top post with the current progress and also released 2.0.0a3 - if no one is holding me back with negative testing results, I'll soon merge the borg2 branch into master.

@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b1 milestone Jul 4, 2022
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Aug 22, 2022

@RubenKelevra well, I see what you mean, but that is not how "borg create" works.

But maybe check the issue tracker if we have a ticket about this and if not, create a new one, so we can collect ideas there.

@RubenKelevra
Copy link

@RubenKelevra well, I see what you mean, but that is not how "borg create" works.

Interesting, can you elaborate or point me to the part which is different than I think, so I can take a look? 🤔

But maybe check the issue tracker if we have a ticket about this and if not, create a new one, so we can collect ideas there.

Will do

@arodland
Copy link

arodland commented Jan 6, 2023

There shouldn't be any need to drop 32-bit support to be y2038-clean. 32-bit platforms can still have a 64-bit time_t, and most of them do, and have done for 5-10 years at least.

@enkore
Copy link
Contributor

enkore commented May 24, 2023

Have there been any major complaints / pain points with the JSON API? The only things I've found are

(a) (largely hypothetical) encoding woes when involving file names (obviously file names don't have to be representable in unicode regardless of locale) and on weird systems (#2273) and
(b) it's annoying to parse when stdout and stderr are multiplexed, because stdout uses pretty printing (#6053, #3605)

@ThomasWaldmann
Copy link
Member Author

@enkore the json encoding issues for e.g. path and also some other things that can not be represented as valid unicode (== without surrogate escapes) were solved some months ago, e.g.:

  • have path member with an unicode approximation ("printable")
  • have path_b64 with an exact base64 representation of the original bytes

Especially on samba servers this is not at all hypothetical, but a very practical issue, because the servers existing since some decades already collected all sorts of historical path encodings.

@ThomasWaldmann ThomasWaldmann changed the title borg 2.0: it's coming! borg2: it's coming! Jun 1, 2023
@issmirnov
Copy link

@ThomasWaldmann my vote is on delaying the release and only doing one breaking change. Otherwise, your users will have to migrate v1-v2 with breaking changes, and then within a "short" time (6-12 months?) have to migrate v2-v3. Some users will be on v1, so you'd also have to build out v1-v3 upgrade paths and checks.

Borg v1 works great, we've waited this long, we can wait a little longer to just have to pay the pain of migration once.

Everyone, feel free to thumbs up / thumbs down this comment to express your opinion.

@tmm360
Copy link

tmm360 commented Sep 12, 2023

I think that all is a matter of timing. How much is "a lot"? If is 6 months, merge them. If it is a fundamental rewrite and will take 2 or more years to be stable, do two separate releases.

@knutov
Copy link

knutov commented Sep 12, 2023

my two cents: if it's ready - it's ready.

some changes will happen eventually, there is no problem to do small updates in scripts.

New version has a lot of benefits, why wait to us it?

@ThomasWaldmann
Copy link
Member Author

@tmm360 What I have in mind is a big change (not even sure how big), my and other contributors' free time is a bit hard to predict, so it makes the overall time needed somehow unpredictable.

Maybe forking off some new borg-ng branch from master and just starting that development there, while fixing bugs and missing stuff in master branch would be an option. Depending on more insights developing over time, a release could be made from either branch.

@tmm360
Copy link

tmm360 commented Sep 12, 2023

@ThomasWaldmann at this point I've no doubt it should be another release, and keep time to develop it without need to hurry. It looks something of huge, and if borg2 is ready, my idea it should be released as is.

@darkk
Copy link

darkk commented Sep 29, 2023

Speaking of pro and contra I'd also add that migration might also have a desirable side-effect of backup verification.

However, I understand that the process might require twice as much of storage space under certain conditions.

@RafaelKr
Copy link

was 2.0 put on hold?

The commit history tells it is actively worked on: https://github.com/borgbackup/borg/commits/master/

@ThomasWaldmann
Copy link
Member Author

@j1warren As my work on borg2 will likely take quite a bit longer, I temporarily switched focus to borg 1.x and made a "refresh" there in form of borg 1.4 (currently beta 1), see #7975 for more details. Will continue to work on borg2 soon.

@ThomasWaldmann
Copy link
Member Author

#8332 has some more radical changes needing review.

@elho maybe have a look, close to your 3rd item in #6602 (comment) .

The borg 2.0 code will still need to deal with reading borg 1.x archives for borg transfer to migrate them into borg 2 repo, thus we have to be a bit careful not to tear down some stuff we still need.

@struthio
Copy link

The borg 2.0 code will still need to deal with reading borg 1.x archives for borg transfer to migrate them into borg 2 repo, thus we have to be a bit careful not to tear down some stuff we still need.

Can it be released in two steps.
Step 1. Borg2 with new archive format etc is ready, but only for new repositories.
Step 2. Compatibility with borg1 archives

@ThomasWaldmann
Copy link
Member Author

@struthio well, the compatiblity code is just the old code, so it is already present. And unit tests say, that borg transfer works in my #8332 PR, so no need for 2 steps.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 8, 2024

#8332 experiment was successful (AFAIK) and was merged into master, I will update top post here accordingly.

everybody can help beta testing this huge change in 2.0.0b10+.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 22, 2024

Via the borgstore rclone backend, borg just got cloud storage support (for 100+ cloud storage providers).

@borgbackup borgbackup deleted a comment from neuhaus Oct 5, 2024
@borgbackup borgbackup deleted a comment from AlphaJack Oct 5, 2024
@borgbackup borgbackup deleted a comment from FabioPedretti Oct 5, 2024
@borgbackup borgbackup deleted a comment from FabioPedretti Oct 5, 2024
@borgbackup borgbackup deleted a comment from RafaelKr Oct 5, 2024
@borgbackup borgbackup deleted a comment from struthio Oct 5, 2024
@borgbackup borgbackup deleted a comment from struthio Oct 5, 2024
@borgbackup borgbackup deleted a comment from mibmo Oct 5, 2024
@borgbackup borgbackup deleted a comment from FabioPedretti Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests