Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Faster cde rejection #255

Merged
merged 5 commits into from
Nov 19, 2024
Merged

Conversation

caldwell
Copy link
Contributor

Hi, this is a followup to #85, #47, and zip-old#412.

First off, #93 seems to have sped up the code significantly. Thanks!
I have some tweaks here that can improve the performance even more. But even if these changes do not get accepted, I think #93 is probably fast enough for my use case—the old-zip code was taking somewhere between 200ms and 500ms to search for the end-of-central-directory-record, the new code is in the 30ms to 60ms range. It's not amazing but it's 10x faster and (more importantly) just under the human perception of lag.

That said, there are 3 commits here:

  1. A benchmark for the particular behavior I am seeing. It creates a 17MB file of zeros and measures how fast the code fails the open. I used a temporary file instead of a large buffer so that it would include real read() round-trips from the kernel.
    Note: I had issues running the benchmarks. It seems like this change might have broken one of them. As an aside, when I was reading the pkzip appnote for this PR, I happened across section 4.4.1.4 which sounds like it might apply to this breakage.
    To get past this I always just ran the one benchmark I cared about:

    cargo bench --bench read_metadata -- parse_large_non_zip
    

    This new benchmark code uses TempDir so I made this branch on top of deps(dev): Use the tempfile crate instead of the deprecated tempdir crate #254. Apologies if that makes it more annoying to merge.

  2. A patch to enlarge the END_WINDOW_SIZE in Zip32CentralDirectoryEnd::find_and_parse(). I benchmarked several different sizes from 512 to 65536. Here are the results I got:

    Machine END_WINDOW_SIZE Benchmark results
    Mac 512 30,450,608 ns/iter (+/- 673,910)
    Mac 4096 7,741,366 ns/iter (+/- 521,101)
    Mac 8192 5,773,801 ns/iter (+/- 411,277)
    Mac 16384 4,792,533 ns/iter (+/- 332,918)
    Mac 32768 4,324,072 ns/iter (+/- 384,048)
    Mac 65536 4,047,872 ns/iter (+/- 352,416)
    linux 512 65,132,581 ns/iter (+/- 7,429,976)
    linux 4096 14,109,503 ns/iter (+/- 2,892,086)
    linux 8192 9,942,500 ns/iter (+/- 1,886,063)
    linux 16384 8,205,851 ns/iter (+/- 2,902,041)
    linux 32768 7,012,011 ns/iter (+/- 2,222,879)
    linux 65536 6,577,275 ns/iter (+/- 881,546)

    The linux box is an x86_64 running Debian testing. Its /tmp is a tmpfs mount. The mac is an M1 running macOS 15.0.1. Its /tmp is an apfs backed by an SSD.

    From those times, 8192 seems to be the sweet spot. It's significantly (~6x) faster than 512 and increasing it doesn't further improve performance by very much. So the patch uses 8192.

    I don't anticipate this patch being controversial as it doesn't really affect anything except using a tiny bit more RAM while loading the zip.

  3. A patch to shorten the search window at the end. I tested 3 configurations here: searching the whole file, searching the last 128K and searching the last 66,000 bytes. I did these benchmarks with all the END_WINDOW_SIZEs combos shown above, but they aren't interesting so I'm just going to show the 8192 size here. Here are the results I got:

    Machine Search Technique Benchmark results
    mac whole file 5,773,801 ns/iter (+/- 411,277)
    mac last 128k 54,402 ns/iter (+/- 4,126)
    mac last 66k 36,152 ns/iter (+/- 4,293)
    linux whole file 9,942,500 ns/iter (+/- 1,886,063)
    linux last 128k 73,604 ns/iter (+/- 16,662)
    linux last 66k 41,483 ns/iter (+/- 19,692)

    As you might expect, closing this windows makes a extremely large difference when the file is large.

    Since 7a55945, current code searches the entire file. Previously it searched the last 66,000 bytes (5237543). Before that, it searched 65557 bytes, as discussed in zip-old#183.

    pkware's zip implementation isn't open source so it's hard to check, but a comment in zip-old#183 claims it uses 66,000 bytes which is where that number came from.

    Info-Zip, however, seems to be the main zip program shipped in Debian, Homebrew, Nix, Fedora, etc. Its official source is only available on an ftp site, though there's someone's random fork on github that I'll link to. The relevant part is here and is the same as the info-zip source from the ftp site. They search back 0x20000 bytes, which is 128K.

    Given that is the de-factor zip in the open source would and seems to be more generous than pkzip, it seems like this is a reasonable number to split the difference between (overly?) strict spec adherence (65557 bytes) and searching the whole file.

    That was a long winded way of saying that in the 3rd patch I chose to limit the search to the last 128K of the file.

I understand the last one might be more controversial given that it could impact compatibility. I believe the 128K is pretty reasonable number for the reasons give above, but I've left the PR in 3 commits so its easier if you don't want that last one.

Local test checklist:

  • cargo test
  • cargo test --no-default-features
  • cargo test --all-features
  • cargo clippy --all-targets
  • cargo clippy --all-targets --no-default-features
  • cargo clippy --all-targets --all-features
  • cargo doc --no-deps
  • cargo doc --no-deps --no-default-features
  • cargo doc --no-deps --all-features
  • cargo fmt --check --all

†: Had warnings unrelated to these changes.

caldwell and others added 5 commits October 22, 2024 13:13
I tested several END_WINDOW_SIZEs across 2 machines:

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
512:   test parse_large_non_zip  ... bench:  30,450,608 ns/iter (+/- 673,910)
4096:  test parse_large_non_zip  ... bench:   7,741,366 ns/iter (+/- 521,101)
8192:  test parse_large_non_zip  ... bench:   5,807,443 ns/iter (+/- 546,227)
16384: test parse_large_non_zip  ... bench:   4,794,314 ns/iter (+/- 419,114)
32768: test parse_large_non_zip  ... bench:   4,262,897 ns/iter (+/- 397,582)
65536: test parse_large_non_zip  ... bench:   4,060,847 ns/iter (+/- 280,964)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
512:   test parse_large_non_zip  ... bench:  65,132,581 ns/iter (+/- 7,429,976)
4096:  test parse_large_non_zip  ... bench:  14,109,503 ns/iter (+/- 2,892,086)
8192:  test parse_large_non_zip  ... bench:   9,942,500 ns/iter (+/- 1,886,063)
16384: test parse_large_non_zip  ... bench:   8,205,851 ns/iter (+/- 2,902,041)
32768: test parse_large_non_zip  ... bench:   7,012,011 ns/iter (+/- 2,222,879)
65536: test parse_large_non_zip  ... bench:   6,577,275 ns/iter (+/- 881,546)

In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192
didn't make much of a difference on top of that.
I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   5,773,801 ns/iter (+/- 411,277)
last 128k:    test parse_large_non_zip              ... bench:      54,402 ns/iter (+/- 4,126)
last 66,000:  test parse_large_non_zip              ... bench:      36,152 ns/iter (+/- 4,293)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   9,942,306 ns/iter (+/- 1,963,522)
last 128k:    test parse_large_non_zip              ... bench:      73,604 ns/iter (+/- 16,662)
last 66,000:  test parse_large_non_zip              ... bench:      41,349 ns/iter (+/- 16,812)

As you might expect these significantly increase the rejection speed for
large non-zip files.

66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945.

128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.

[1] Info-zip is extremely old and doesn't not have an official git repo to
    link to. However, an unofficial fork can be found here:
    https://github.com/hiirotsuki/infozip/blob/bb0c4755d44f21bda0744a5e1868d25055a543cc/zipfile.c#L4073
@Pr0methean Pr0methean enabled auto-merge November 19, 2024 16:01
@Pr0methean Pr0methean added this pull request to the merge queue Nov 19, 2024
Merged via the queue into zip-rs:master with commit 73143a0 Nov 19, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants