-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't shrink window log when streaming with a dictionary #2451
Conversation
d13d1cd
to
6ce1fee
Compare
@Cyan4973 when a user calls So far I've left it alone. But I'm thinking we should change |
I agree with the consistency argument. |
58bd626
to
58476bc
Compare
I've made that change and added a test for |
* Add a test that runs without a pledgedSrcSize and with a dictionary. * Add github.tar data with uses the github dictionary while compressing github.tar, instead of each file individually.
Fixes facebook#2442. 1. When creating a dictionary keep the same behavior as before. Assume the source size is 513 bytes when adjusting parameters. 2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep the same behavior as before. 3. When attaching a dictionary keep the same behavior of ignoring the dictionary size. When streaming this will select the largest parameters and not adjust them down. But, the CDict will use the correctly sized parameters, which seems like the right tradeoff. 4. When not attaching a dictionary (either forced not to, or using a prefix dictionary) we select parameters based on the dictionary size + source size, and assume the source size is small, which is the same behavior as before. But, now we don't adjust the window log (and hash and chain log) down when the source size is unknown. When the source size is unknown all cdicts should attach, except when the user disables attaching, or `forceWindow` is used. This means that when streaming with a CDict we end up in the good case where we get small CDict parameters, and large source parameters. TODO: Add a streaming + dictionary regression test case.
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way we treat streaming compression. Choose parameters based on the dictionary size + source size, and assume the source size is small if unkown. But, don't shrink the window log down in ZSTD_adjustCParams_internal().
Thanks, with these changes it seems to be mostly ok: input 159219 bytes, without dict 29518 bytes, streaming api with dict 24177 bytes, simple api with dict 23866 bytes. For some reason it's still slightly larger when using the streaming api with |
You should expect small differences in compressed size between the single-buffer and streaming compression. When zstd knows the source size (like in single-buffer mode), it will optimize the compression parameters for that size. When using streaming, without setting the pledged source size, zstd uses the "generic" parameters, so it will likely compress slightly worse. |
Fixes #2442.
Assume the source size is 513 bytes when adjusting parameters.
the same logic as case 4 (not attaching a dictionary).
the dictionary size. When streaming this will select the
largest parameters and not adjust them down. But, the CDict
will use the correctly sized parameters, which seems like the
right tradeoff.
using a prefix dictionary) we select parameters based on the
dictionary size + source size, and assume the source size is
small, which is the same behavior as before. But, now we don't
adjust the window log (and hash and chain log) down when the
source size is unknown.
When the source size is unknown all cdicts should attach, except
when the user disables attaching, or
forceWindow
is used. Thismeans that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.
I've added a test case that catches this bug. It compresses using
a dictionary, without setting the pledged src size, and with a large
source. See the changes to
results.csv
in the"Don't shrink window log when streaming with a dictionary"
commit.
I've also added a test to
fuzzer.c
to check thatZSTD_getCParams()
and
ZSTD_adjustCParams()
don't shrink the window log down.