-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading binary delta support #1902
Comments
One more thing that occurred to me is that we could support deltas that work for multiple versions at the same time. The diffs could use files or file ranges that are common in multiple versions. This way there would be fewer delta files to upload and serve. |
In order to keep download files as small as possible doesn't that imply we'd have to download a byte range of the file and servers would have to support that? Anyway, I think we should first decide on a container format to move away from xar, and get a development branch that uses a new container and have the unit tests pass. Is choosing the container format going to be intertwined with the delta compression we choose? I guess we will want to compress the entire container. When we generate binary diffs for specific set of individual files, I'm not sure if those will be compressed or not. |
I didn't even think about using HTTP range for this. What I had in mind is creating one patch file that is universal enough to apply to different source versions (i.e. as a base use files that are same in all applicable versions, don't reuse files (or fragments of files) that differ between versions). |
Maybe if size is not sacrificed too much as point of delta binary support is providing smallest downloadable size; this will also likely require significant changes to our current tool. I think container format and compression format is more important now. |
I think probably libarchive and tar is a decent contender; they have a zstd filter. Haven't assessed the size of the dependency though or if the OS version is worthwhile/risk-safe to use. Our bsdiff currently indeed does not do compression as I (expectably) thought, since we compress the entire container. |
What I had in mind is using zstd prefix feature instead of bsdiff, so getting rid of bsdiff entirely. But that is not compatible with any general archive format, especially not tar. |
Also tar as a format is a bit of a mess, with tons of legacy limitations and incompatible workarounds for them. |
I've prototyped zstd as a diffing engine. It's a mixed bag. If files differ significantly, them zstd+prefix is ~10% better than bsdiff+zstd. When files are quite similar, then bsdiff works better, even as bsdiff+gzip. I've also prototyped universal diffs. If you allow 20% overhead, then one diff file can work for 3-5 versions when differences are small, and even 10 versions when differences are big. However, I'm not sure if that feature is desirable enough on its own to justify format switch. For ImageOptim.app I'm getting delta files even as large as 50% of the full archive. I've assumed that it was because either bsdiff or xar wasn't compressing them well enough. However, now I see the data is actually quite tricky to compress, and I don't think any more that there's low-hanging fruit here. |
Shorter term I think just replacing the container format and keeping bsdiff is a reasonable first goal (getting off of xar is important). Then replacing bsdiff or adding prefix-diff or something else could be a next step; as your interesting results indicates, this could be tricky. I'm not sure if this works if we end up wanting to use zstd prefix and it's not compatible with a general archive format like you say.. Regardless of that, I was also thinking about compression over everything which I'm not sure if you factored in. If you have a delta of 5 files, and 3 are binary diffed, but 2 are new:
|
We just need some format that preserves file permissions, type (regular, dir, symlink), path name, contents, and way to add additional properties (like is this a removal, insertion, or diff).. I don't think we want xattr's or acl's. Developing our own, which is an option, may lead to some risk although maybe we could make something more efficient.. |
I think rolling our own format would be fine. It can be something really simple like a series of |
Agree! Firstly It need to change the container format. It still doesn't work in macOS beta4 |
That was fixed in #1906; you'll need to use a nightly version of BinaryDelta (or compile the latest code from source yourself) and generate new patches to work on that Monterey beta. If it still doesn't work, file a new bug with a repro case (the old and new app). |
Two more things:
|
Don't worry about upgrading SHA here. We have a proper hash in EdDSA signing. Here it's just a consistency check, and for that case SHA-1 isn't broken. Have a look at Docker — some releases used binary delta (I don't see any currently, not sure why). They have lots of small files. The metadata of these files alone may add up to a significant overhead. |
I wrote a new implementation for the container/archive in #2051 and added a file clone tracking (this subsumes file renames). |
Implementation was merged, now just some polish work. Short summary in 23fc577
|
Support for new format to try out including tools support (generate_appcast, BinaryDelta) has landed in 2.1.0-beta.1. |
From #1899, it is noted xar is deprecated, and our current delta implementation is lacking.
One suggestion is:
zstd sounds promising. https://github.com/facebook/zstd/wiki/Zstandard-as-a-patching-engine has more info about using it for patching files like Sparkle currently does. I personally don't enjoy the slow bsdiff generation times so I'm happy to try something else..
We need to also (perhaps more importantly) switch to a different container format than xar, ideally one that is small(?) in surface and not a hassle to embed / maintain (we never had to embed xar). What are our options here?
Maybe we could upgrade and introduce new major versions to the format in piecemeal, starting first with changing the container format.
We also need to upgrade generate_appcast so it picks the correct version to generate patches from based on the Sparkle bundle in the old app (if it doesn't do that already).
We can drop support for the version 1 format that nobody should be using anymore (like from Sparkle 1.9 or something).
Tentatively, I'm going to classify this as a long term goal with some major undertaking, to be in "2.1" milestone, but would like to be proven otherwise :)
The text was updated successfully, but these errors were encountered: