Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Use the tempfile crate instead of the tempdir crate (which is deprecated) https://github.com/rust-lang-deprecated/tempdir?tab=readme-ov-file#deprecation-note * perf: Add benchmark that measures the rejection speed of a large non-zip file * perf: Speed up non-zip rejection by increasing END_WINDOW_SIZE I tested several END_WINDOW_SIZEs across 2 machines: Machine 1: macOS 15.0.1, aarch64 (apfs /tmp) 512: test parse_large_non_zip ... bench: 30,450,608 ns/iter (+/- 673,910) 4096: test parse_large_non_zip ... bench: 7,741,366 ns/iter (+/- 521,101) 8192: test parse_large_non_zip ... bench: 5,807,443 ns/iter (+/- 546,227) 16384: test parse_large_non_zip ... bench: 4,794,314 ns/iter (+/- 419,114) 32768: test parse_large_non_zip ... bench: 4,262,897 ns/iter (+/- 397,582) 65536: test parse_large_non_zip ... bench: 4,060,847 ns/iter (+/- 280,964) Machine 2: Debian testing, x86_64 (tmpfs /tmp) 512: test parse_large_non_zip ... bench: 65,132,581 ns/iter (+/- 7,429,976) 4096: test parse_large_non_zip ... bench: 14,109,503 ns/iter (+/- 2,892,086) 8192: test parse_large_non_zip ... bench: 9,942,500 ns/iter (+/- 1,886,063) 16384: test parse_large_non_zip ... bench: 8,205,851 ns/iter (+/- 2,902,041) 32768: test parse_large_non_zip ... bench: 7,012,011 ns/iter (+/- 2,222,879) 65536: test parse_large_non_zip ... bench: 6,577,275 ns/iter (+/- 881,546) In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192 didn't make much of a difference on top of that. * perf: Speed up non-zip rejection by limiting search for EOCDR. I benchmarked several search sizes across 2 machines (these benches are using an 8192 END_WINDOW_SIZE): Machine 1: macOS 15.0.1, aarch64 (apfs /tmp) whole file: test parse_large_non_zip ... bench: 5,773,801 ns/iter (+/- 411,277) last 128k: test parse_large_non_zip ... bench: 54,402 ns/iter (+/- 4,126) last 66,000: test parse_large_non_zip ... bench: 36,152 ns/iter (+/- 4,293) Machine 2: Debian testing, x86_64 (tmpfs /tmp) whole file: test parse_large_non_zip ... bench: 9,942,306 ns/iter (+/- 1,963,522) last 128k: test parse_large_non_zip ... bench: 73,604 ns/iter (+/- 16,662) last 66,000: test parse_large_non_zip ... bench: 41,349 ns/iter (+/- 16,812) As you might expect these significantly increase the rejection speed for large non-zip files. 66,000 was the number previously used by zip-rs. It was changed to zero in 7a55945. 128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero) choice for compatibility reasons. [1] Info-zip is extremely old and doesn't not have an official git repo to link to. However, an unofficial fork can be found here: https://github.com/hiirotsuki/infozip/blob/bb0c4755d44f21bda0744a5e1868d25055a543cc/zipfile.c#L4073 --------- Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
- Loading branch information