-
-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use borgstore and other big changes #8332
use borgstore and other big changes #8332
Conversation
91b7337
to
dd57cb3
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #8332 +/- ##
==========================================
+ Coverage 81.63% 81.73% +0.10%
==========================================
Files 67 70 +3
Lines 12158 12648 +490
Branches 2194 2287 +93
==========================================
+ Hits 9925 10338 +413
- Misses 1647 1665 +18
- Partials 586 645 +59 ☔ View full report in Codecov by Sentry. |
74f395f
to
6a5c9b2
Compare
a655fb8
to
928846d
Compare
738410e
to
ef1eb69
Compare
Guess this needs some review. Anybody? |
6af66d4
to
0f4d475
Compare
Simplify the repository a lot: No repository transactions, no log-like appending, no append-only, no segments, just using a key/value store for the individual chunks. No locking yet. Also: mypy: ignore missing import there are no library stubs for borgstore yet, so mypy errors without that option. pyproject.toml: install borgstore directly from github There is no pypi release yet. use pip install -e . rather than python setup.py develop The latter is deprecated and had issues installing the "borgstore from github" dependency.
It uses xxh64 hashes of the meta and data parts to verify their validity. On a server with borg, this can be done server-side without the borg key. The new RepoObj header has meta_size, data_size, meta_hash and data_hash.
0f4d475
to
f265324
Compare
…ackup#8347 otherwise the lock might become stale and could get killed by any other borg process. note: ThreadRunner class written by PyCharm AI and only needed small enhancements. nice.
ac47878
to
4ae1842
Compare
Archives was built with a dictionary-like api, but in future we want to go away from a read-modify-write archives list.
previously, borg always read all archives entries, modified the list in memory, wrote back to the repository (similar as borg 1.x did). now borg works directly with archives/* in the borgstore.
old borg just didn't commit the transaction and thus caused a transaction rollback if not in repair mode. we can't do that anymore, thus we must avoid modifying the repo if not in repair mode.
not for check and compact, these need an exclusive lock. to try parallel repo access on same machine, same user, one needs to use a non-locking cache implementation: export BORG_CACHE_IMPL=adhoc this is slow due the missing files cache in that implementation, but unproblematic because no caches/indexes are persisted.
if the manifest file is missing, check generated *.1 *.2 ... archives although an entry for the correct name and id was already present. BUG! this is because if the manifest is lost, that does not imply anymore that the complete archives directory is also lost, as it did in borg 1.x. Also improved log messages a bit.
borg delete and borg prune do a quick and dirty archive deletion, just removing the archives directory entry for them. --undelete-archives can still find the archive metadata objects by completely scanning the repository and re-create missing archives directory entries. but only until borg compact would remove all unused data. if only the manifest is missing or corrupted, do not run that scan, it is not required for the manifest anymore.
4ae1842
to
b50ed04
Compare
I'll merge this now. b10 tomorrow and I also might work on some other stuff. |
Fixes #8330 (no cache sync anymore).
Fixes #8325 (no segments, replaying segments anymore).
Fixes #7377, fixes #7379 (no transactions, no refcounting anymore).
Fixes #7154 (new repo locking code).
Fixes #6983 (no repo index any more).
Fixes #6899 (no compact segments any more).
Fixes #6121, fixes #7278 (no cache sync any more, no archives fetching).
Fixes #6567 (not needed any more).
Fixes #6331 (hints file not used anymore).
Fixes #6291, fixes #6289 (no segment files any more, no DEL tags).
Fixes #6288 in a different way (borg rspace command to reserve disk space that can be freed in case of emergencies).
Fixes #5654, fixes #6057, fixes #6094, fixes #7154 (new lock implementation for borgstore).
Fixes #5514 (new locking system allows shared locks for most).
Fixes #5261 (bypass-lock was removed).
Fixes #5050 (no hints any more).
Fixes #4827 (no persistent chunks cache, no cache sync anymore).
Fixes #4438 (not possible, delete/prune just kill the root reference now and don't look at anything else).
Fixes #4428 (separate objects in config/*).
Fixes #4004 - most borg commands now can use a shared repository lock (exceptions: borg check and compact).
Fixes #3128. Fixes #3196. (no cache sync anymore)
Fixes #2454, fixes #2398, fixes #3036 (no commits, no transactions any more, no log-like/append-only segments, new check implementation).
Fixes #2681, fixes #2571 (no cache sync, no chunks.archive.d).
Fixes #2454 (no commits, no transactions anymore).
Fixes #2444 (new borg check implementation, no LoggedIO code used any more).
Fixes #1293 (solved in a different way, delete/prune are super fast now).
Fixes #1244 (no transactions anymore).
Fixes #916, fixes #474, fixes #1766 (no LocalCache any more, no cache transaction any more).
Fixes #768 - most borg commands now can use a shared repository lock (exceptions: borg check and compact).
Maybe builds the foundation to solve / work on:
new repository based on borgstore project
stores chunks into separate files (not: segment/pack files, at least for now).
borgstore has a very simple api that makes implementing backends easy.
in borg 2.0, this will primarily use the "file:" backend from borgstore to implement
file:
andssh:
repositories, but long term we might go away from the borg.remote code (RPC api via ssh) and just use a "remote" borgstore.there is also a
sftp:
repository now implemented via the respective borgstore backend. more might be coming, even cloud stuff should be easily possible with that (PRs welcome!).repository: convergence rather than transactions
borg 1.x used the segment files in a log-like way (only appending new stuff at the end) and implemented transactions via a COMMIT tag - if the transaction was not completed (no COMMIT at the end), it rolled back the incomplete transaction to the last commit.
the code implementing transactions was rather complex and required an exclusive lock on the repo for correct operations.
borg2 now just adds repo objects in the right order, first pushing referenced objects, then the references to them. even if an operation is interrupted, nothing bad happens.
there might be some unreferenced objects for a while, but they will get referenced if the operation is retried later and completes. borg compact will deal with anything not needed.
no checkpoint archives, no .borg_part files anymore
saving them was only needed due to the transactional/rollback behavior of borg1.
borg2 does not do that rollback any more, so the checkpoints are not needed.
the user can just re-run the interrupted command and it will notice that some stuff is already present in the repo and only transfer new stuff.
new borg compact doing garbage collection
borg compact is still needed to free space in the repo, but it doesn't really need to "compact segments" as there are no segment files anymore. so it will do less I/O to move stuff around in the repo.
but maybe some sort of segment/pack files will come back later, so we'll just keep the command name.
borg compact is now doing more work that was previously done by borg delete and borg check: it determines which chunks are not used anymore (and removes them). because it needs to read the archives for that, borg compact now needs the borg key.
super fast borg delete and borg prune
as borg does not do precise refcounting any more, delete now just kills the archive from
archives/*
(removing the reference to its root) and lets borg compact clean up all the now unreferenced chunks.new repo locking code
repository locking code using borgstore locks/*.
locks auto-expire and get deleted if they don't get refreshed regularly (this is good to clean up stale locks of now-dead remote borg processes).
locks also get deleted if their owner process is dead (this is good to clean up stale locks of now-dead local borg processes).
most borg commands now can use a shared repository lock (exceptions: borg check and compact, which must use an exclusive lock).
new repo config / repo key storage
Config is now stored into separate files config/* - less risk (e.g. for the repokey) if other config items need updating.
Repokey now stored into keys/* (only 1 key for now).
new manifest storage
the manifest chunk that also had the list of archives inside was split into config/manifest and separate files archives/*.
some features might come back later (stats, quota, append-only, ...)