deflate_compress: Significant performance drop + increase in ratio compared to zlib on levels 8 and 9 with some data #85

dktapps · 2020-10-10T19:44:30Z

I'm currently working on a PHP binding for libdeflate.

I'm compressing the following file using zlib vs libdeflate: https://github.com/pmmp/BedrockData/blob/afc885cccae38048d309911f2c5ddcdcb6af8152/required_block_states.nbt

From levels 1 to 7, zlib and libdeflate achieve comparable compression ratios, and libdeflate is consistently taking 40-60% of the time that zlib takes.

However, levels 8 and 9 feature significantly worse performance, and the compression ratio increases, not just compared to zlib, but also compared to lower levels of libdeflate.

I wouldn't have thought much of it except for the fact that the performance runs off a cliff at those high levels, enough for me to wonder if there isn't some bug in the code running on those higher levels, and it seems peculiar that the ratio would increase.

The following Callgrind profile indicates that the bulk of the CPU time in my benchmark is being spent in deflate_compress_near_optimal:

OS is Ubuntu 20.04.

Attached are some compression numbers from a quick and dirty impl.
libdeflate-test.txt

Is this a bug or should I be expecting this kind of behaviour?

The text was updated successfully, but these errors were encountered:

ebiggers · 2020-10-10T21:15:50Z

The algorithm changes at level 8 and above. So that's why there can be a discontinuity between level 7 and 8. The algorithm at levels 8 and above is slower than the algorithm at levels 7 and below. However, on most inputs it produces a significantly better compression ratio. Also, it's usually still slightly faster than zlib.

Compression is all about heuristics, so it will always be possible to find some inputs where the algorithms don't work as well as expected. So I'm not sure it's fair to call it a "bug". But when I have a chance, I'll try to add some more heuristics to improve on this input as well as the input given in #57 on which libdeflate also does poorly -- probably something along the lines of dynamically selecting the algorithm and minimum match length based on a quick analysis of the input data. The file you gave does well with the "lazy" algorithm and a minimum match length of 4 bytes.

dktapps · 2020-10-22T14:44:11Z

Sorry about that, misclicked on my phone ..

Instead of switching directly from the lazy compressor at level 7 to the near-optimal compressor at level 8, use the lazy2 compressor at levels 8-9 and don't switch to near-optimal until level 10. This avoids poor compression ratio and bad performance (both significantly worse than level 7, and significantly worse than zlib) at levels 8-9 on data where the near-optimal compressor doesn't do well until the parameters are cranked up. On data where the near-optimal compressor *does* do well, this change worsens the compression ratio of levels 8-9, but also speeds them up a lot, thus positioning them similarly vs. zlib as the lower levels (i.e. much faster and slightly stronger, rather than slightly faster and much stronger). The difference between levels 9 and 10 is increased, but that's perhaps the least bad place to have a discontinuity. Resolves #85

ebiggers · 2022-01-01T00:52:02Z

f7d3a70 fixed this by changing the algorithm used at levels 8 and 9. Other commits made other improvements to the compression algorithms as well.

Here are the results of DEFLATE compression on required_block_states.nbt. Table entries contain compressed size in bytes / compression time in milliseconds; "new" is commit 3dca7de and "old" is commit 047aa84:

Level	libdeflate (new)	libdeflate (old)	zlib
1	118528 / 3	119349 / 3	118729 / 3
2	106640 / 3	107242 / 3	110383 / 4
3	99915 / 3	100406 / 3	102669 / 4
4	92241 / 4	93719 / 4	97863 / 6
5	91444 / 4	90841 / 4	90502 / 7
6	81281 / 5	80877 / 5	74639 / 10
7	74085 / 8	74058 / 8	73230 / 13
8	71513 / 17	91419 / 36	71121 / 21
9	70995 / 22	88240 / 46	71129 / 23
10	72525 / 92	77829 / 72	N/A
11	69871 / 184	72562 / 115	N/A
12	68442 / 397	70696 / 190	N/A

Note: level 10 is the odd one out now (worse than level 9), but not by very much. It's hard to handle this file perfectly, as it's a bit unusual.

Edit: changed table to be easier to read

Previously for read names compression levels between 4 and 7 zlib considerably beat libdeflate due to libdeflate's poor selection of minimum match length. This was raised as issue ebiggers/libdeflate#85. It's now been resolved in libdeflate 1.9 (along with some general improvements elsewhere), and in all cases libdeflate is a better choice. Also fixed the mapping of levels 1..9 (standard zlib) to 1..12 (libdeflate). The maths in the comment was incorrect as it's an integer calculation not a floating point one. Figures from converting 1 million NovaSeq records from BAM to CRAM 3.0: Libdeflate 1.9+PR-7 0m43.816s 204732408 48381374 (RN=libdeflate) Libdeflate 1.8-7 0m45.379s 206626451 50580708 (RN=zlib) Libdeflate 1.8-7 1m1.431s 210172035 *54126292 (RN=libdeflate, forced) Zlib only -7 0m48.531s 207189920 50580708 (RN=zlib) (Default level) Libdeflate 1.9+PR-5 0m30.323s 207793059 51023626 (RN=libdeflate) Libdeflate 1.8-5 0m33.265s 208714328 51612215 (RN=zlib, as devel) Libdeflate 1.8-5 0m29.753s 213024792 *55922679 (RN=libdeflate, forced) Zlib only -5 0m40.353s 208499406 51612215 (RN=zlib) We can clearly see the problem(*) in using libdeflate for read-names in 1.8, how it's fixed in 1.9, and how it is now smaller/faster than zlib again. At level 9 it was using libdeflate for RN already, but we see improvements to both RN and elsewhere which are simply down to other changes in the library: Time Size RN Libdeflate 1.9+PR-9 2m21.757s 202890458 47327597 (RN=libdeflate) Libdeflate 1.8-9 2m6.304s 204292448 48541687 (RN=libdeflate) Zlib only -9 1m20.966s 206482425 49988310 (RN=zlib) Finally, the impact of switching level 9 from the old mapping (11) to new (12; "9+"), along with a more complete table for curiosities sake: Time Size RN Libdeflate 1.9+PR-9+ 2m54.664s 202315823 46783148 Libdeflate 1.9+PR-9 2m21.757s 202890458 47327597 Libdeflate 1.9+PR-8 1m39.040s 202934405 47247996 Libdeflate 1.9+PR-7 0m43.816s 204732408 48381374 Libdeflate 1.9+PR-6 0m31.521s 207437149 50768595 Libdeflate 1.9+PR-5 0m30.323s 207793059 51023626 (default level) Libdeflate 1.9+PR-4 0m29.478s 210425588 52946850 Libdeflate 1.9+PR-1 0m27.460s 215975209 57142706 (no change) (Note: "1.9" here is actually master, which is a few commits on from the tag, but the main gist of it is the same.)

ebiggers added the enhancement label Oct 10, 2020

dktapps closed this as completed Oct 22, 2020

dktapps reopened this Oct 22, 2020

ebiggers closed this as completed in f7d3a70 Dec 31, 2021

jkbonfield mentioned this issue Jan 20, 2022

Detect libdeflate >= 1.9 and adjust CRAM RN encoding. samtools/htslib#1383

Merged

ebiggers mentioned this issue Feb 10, 2022

Compression levels adjustment #166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deflate_compress: Significant performance drop + increase in ratio compared to zlib on levels 8 and 9 with some data #85

deflate_compress: Significant performance drop + increase in ratio compared to zlib on levels 8 and 9 with some data #85

dktapps commented Oct 10, 2020

ebiggers commented Oct 10, 2020

dktapps commented Oct 22, 2020

ebiggers commented Jan 1, 2022 •

edited

Loading

deflate_compress: Significant performance drop + increase in ratio compared to zlib on levels 8 and 9 with some data #85

deflate_compress: Significant performance drop + increase in ratio compared to zlib on levels 8 and 9 with some data #85

Comments

dktapps commented Oct 10, 2020

ebiggers commented Oct 10, 2020

dktapps commented Oct 22, 2020

ebiggers commented Jan 1, 2022 • edited Loading

ebiggers commented Jan 1, 2022 •

edited

Loading