-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory monotonicity test over srcSize #2538
Conversation
Is that the only instance of non-monotonicity for source size? |
I think we need monotonicity in source size, at the very least least when reporting Right now if you use The only reason BtrFS (which pre-allocates workspaces) isn't bitten by this is because they use streaming. When the smaller parameters are selected, the window is also shrunk enough to make up for the lack of table space. |
Mem usage also decreases for level 4 as we cross the 128 KB parameter boundary. 1559264 bytes -> 1297120 bytes reported by |
I think this actually might be fine? |
zstd/lib/compress/zstd_compress.c Lines 5059 to 5062 in 413b319
zstd/lib/compress/zstd_compress.c Lines 5137 to 5140 in 413b319
When using |
I think monotonicity in the table wrt source size makes sense. Here is the scenario I'm imagining: auto cparams = ZSTD_getCParams(1, 0, 0);
auto size = ZSTD_estimateCCtxSize_usingCParams(cparams);
void* wksp = malloc(size);
...
auto params = ZSTD_getParams(1, srcSize, 0);
auto cctx = ZSTD_initStaticCCtx();
ZSTD_compress_advanced(cctx, dst, dstCapacity, src, srcSize, params); This shows up in the Linux kernel in https://github.com/torvalds/linux/blob/1df27313f50a57497c1faeb6a6ae4ca939c85a7d/crypto/zstd.c#L35-L36 You might say, just use a compression level instead. Currently the kernel is limited to using cparams because of its API (though we could try to update it after the version update). And, I would intuitively expect that for all |
So the reason why level 1 with 16KB source is better with a larger hash table seems to be branch misses. With 32K entries in the hash table, collisions are rare. So you generally either end up with a match, or end up with With 8K entries compression has about 5.5% missed branches. With 32K entries it has about 4.0% missed branches. With 64K entries it has 3.6% missed branches and is slightly faster than hashLog=15. |
@senhuang42 do you have the patch for |
Thanks for the reminder - I actually was looking at this again, and I actually wasn't able to repro any issue with the existing zstd/lib/compress/zstd_compress.c Lines 1336 to 1342 in 5158071
If Currently, adjusting |
5c2de40
to
dff4a0e
Compare
Didn't 128KB level 4 use more memory than 256KB level 4? In which case the looping is necessary because |
128K L4 used more than 256K L4, but not more than >256K L4, so estimateCCtxSize(4) would still be fine in that case, I think. But I think the looping would still be nice to have (row hash might introduce oddities?) |
@terrelln and I were discussing about monotonicity of memory usage as source size increases, as well as compression level increases. This PR adds a test for the former case: mem usage monotonically increases w/
srcSize
.It actually appears that we don't currently do this. Between the 16 KB and 128 KB parameters for level 1, if the source file is > 16 KB and <= 128 KB, we actually use less memory compared to source files <= 16 KB. The parameters in
zstd_compress.c
confirm thiswLog: 14 cLog: 14 hLog: 15
->wLog: 17 cLog: 12 hLog:13
is a decrease in memory usage.Is this something we would want to fix? Do we also want to maintain memory usage monotonicity as compression level increases?