Add memory monotonicity test over srcSize #2538

senhuang42 · 2021-03-16T15:28:08Z

@terrelln and I were discussing about monotonicity of memory usage as source size increases, as well as compression level increases. This PR adds a test for the former case: mem usage monotonically increases w/ srcSize.

It actually appears that we don't currently do this. Between the 16 KB and 128 KB parameters for level 1, if the source file is > 16 KB and <= 128 KB, we actually use less memory compared to source files <= 16 KB. The parameters in zstd_compress.c confirm this wLog: 14 cLog: 14 hLog: 15 -> wLog: 17 cLog: 12 hLog:13 is a decrease in memory usage.

Is this something we would want to fix? Do we also want to maintain memory usage monotonicity as compression level increases?

terrelln · 2021-03-16T18:14:27Z

Is that the only instance of non-monotonicity for source size?

terrelln · 2021-03-16T18:20:28Z

I think we need monotonicity in source size, at the very least least when reporting ZSTD_estimateCCtxSize() et al. And for simplicity, it seems like we should guarantee it in the parameters as well.

Right now if you use ZSTD_estimateCCtxSize(1), you could not compress a source of <= 16 KB with ZSTD_compressCCtx() (unless I'm missing something).

The only reason BtrFS (which pre-allocates workspaces) isn't bitten by this is because they use streaming. When the smaller parameters are selected, the window is also shrunk enough to make up for the lack of table space.

senhuang42 · 2021-03-16T18:23:21Z

Mem usage also decreases for level 4 as we cross the 128 KB parameter boundary. 1559264 bytes -> 1297120 bytes reported by estimateCCtxSize_usingCParams(). These appear to be the only two.

senhuang42 · 2021-03-16T18:29:48Z

Right now if you use ZSTD_estimateCCtxSize(1), you could not compress a source of <= 16 KB with ZSTD_compressCCtx() (unless I'm missing something).

I think this actually might be fine? ZSTD_estimateCCtxSize() assumes ZSTD_CONTENTSIZE_UNKNOWN and therefore should select the > 256K parameter set. But definitely for the specific case tested in the unit test, generating CParams (level 1, >16K, <=128K src), then creating a static CCtx based on a size estimate for that and trying to compress <= 16K, could go badly.

terrelln · 2021-03-16T18:33:10Z

I think this actually might be fine? ZSTD_estimateCCtxSize() assumes ZSTD_CONTENTSIZE_UNKNOWN and therefore should select the > 256K parameter set. But definitely for the specific case tested in the unit test, generating CParams, then creating a static CCtx based on a size estimate for that, could go badly.

zstd/lib/compress/zstd_compress.c

Lines 5059 to 5062 in 413b319

    
           {   /* "default" - for any srcSize > 256 KB */ 
        
               /* W,  C,  H,  S,  L, TL, strat */ 
        
               { 19, 12, 13,  1,  6,  1, ZSTD_fast    },  /* base for negative levels */ 
        
               { 19, 13, 14,  1,  7,  0, ZSTD_fast    },  /* level  1 */

zstd/lib/compress/zstd_compress.c

Lines 5137 to 5140 in 413b319

    
           {   /* for srcSize <= 16 KB */ 
        
               /* W,  C,  H,  S,  L,  T, strat */ 
        
               { 14, 12, 13,  1,  5,  1, ZSTD_fast    },  /* base for negative levels */ 
        
               { 14, 14, 15,  1,  5,  0, ZSTD_fast    },  /* level  1 */

When using ZSTD_estimateCCtxSize(1) the window size isn't counted (it is with ZSTD_estimateCStreamSize()). So the <= 16 KB params use more memory than the > 256 KB params for single-pass compression.

terrelln · 2021-03-17T19:38:02Z

I think monotonicity in the table wrt source size makes sense. Here is the scenario I'm imagining:

auto cparams = ZSTD_getCParams(1, 0, 0);
auto size = ZSTD_estimateCCtxSize_usingCParams(cparams);
void* wksp = malloc(size);

...

auto params = ZSTD_getParams(1, srcSize, 0);
auto cctx = ZSTD_initStaticCCtx();
ZSTD_compress_advanced(cctx, dst, dstCapacity, src, srcSize, params);

This shows up in the Linux kernel in crypto/zstd.c. It can't "tune" the parameters for the source size because of this limitation.

https://github.com/torvalds/linux/blob/1df27313f50a57497c1faeb6a6ae4ca939c85a7d/crypto/zstd.c#L35-L36
https://github.com/torvalds/linux/blob/1df27313f50a57497c1faeb6a6ae4ca939c85a7d/crypto/zstd.c#L155

You might say, just use a compression level instead. Currently the kernel is limited to using cparams because of its API (though we could try to update it after the version update). And, I would intuitively expect that for all level and srcSize memory(ZSTD_getCParams(level, 0, 0)) >= memory(ZSTD_getCParams(level, srcSize, 0)).

terrelln · 2021-03-17T19:59:59Z

So the reason why level 1 with 16KB source is better with a larger hash table seems to be branch misses. With 32K entries in the hash table, collisions are rare. So you generally either end up with a match, or end up with < lowLimit.

With 8K entries compression has about 5.5% missed branches. With 32K entries it has about 4.0% missed branches. With 64K entries it has 3.6% missed branches and is slightly faster than hashLog=15.

terrelln · 2021-03-19T23:01:21Z

@senhuang42 do you have the patch for estimateCCtxSize() ready?

senhuang42 · 2021-03-19T23:53:10Z

@senhuang42 do you have the patch for estimateCCtxSize() ready?

Thanks for the reminder - I actually was looking at this again, and I actually wasn't able to repro any issue with the existing estimateCCtxSize(1) - looking at the code, doesn't it take windowLog into account here or am I missing something?

zstd/lib/compress/zstd_compress.c

Lines 1336 to 1342 in 5158071

    
           size_t const windowSize = MAX(1, (size_t)MIN(((U64)1 << cParams->windowLog), pledgedSrcSize)); 
        
           size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, windowSize); 
        
           U32    const divider = (cParams->minMatch==3) ? 3 : 4; 
        
           size_t const maxNbSeq = blockSize / divider; 
        
           size_t const tokenSpace = ZSTD_cwksp_alloc_size(WILDCOPY_OVERLENGTH + blockSize) 
        
                                   + ZSTD_cwksp_alloc_size(maxNbSeq * sizeof(seqDef)) 
        
                                   + 3 * ZSTD_cwksp_alloc_size(maxNbSeq * sizeof(BYTE));

If windowLog < 17, we effectively would reduce tokenSpace, so in this case the 16 K params would use less mem than the >256K params (though the matchstate would still be larger) since the the increase in matchstate size from the hash and chainlog are offset by the smaller tokenSpace from small enough windowLog.

Currently, adjusting ZSTD_estimateCCtxSize_internal() so it assumes 16 KB, I get 211680 bytes, but for ZSTD_CONTENTSIZE_UNKNOWN: 576224 bytes. So it seems my patch doesn't actually do anything currently, but makes the size estimation a bit safer in case the default params got adjusted to a point where this might be an issue?

…ter sets as well

terrelln · 2021-03-22T19:34:56Z

Didn't 128KB level 4 use more memory than 256KB level 4? In which case the looping is necessary because tokenSpace would be the same.

senhuang42 · 2021-03-22T19:40:33Z

Didn't 128KB level 4 use more memory than 256KB level 4? In which case the looping is necessary because tokenSpace would be the same.

128K L4 used more than 256K L4, but not more than >256K L4, so estimateCCtxSize(4) would still be fine in that case, I think.

But I think the looping would still be nice to have (row hash might introduce oddities?)

Add mem monotonicity test over srcSize

b9dd821

facebook-github-bot added the CLA Signed label Mar 16, 2021

terrelln mentioned this pull request Mar 16, 2021

Fixes for the next linux kernel patch version #2539

Merged

senhuang42 force-pushed the monotonicity_test branch 3 times, most recently from 5c2de40 to dff4a0e Compare March 21, 2021 23:11

Make ZSTD_estimateCCtxSize_internal() loop through all srcSize parame…

dff4a0e

…ter sets as well

terrelln approved these changes Mar 22, 2021

View reviewed changes

Cyan4973 approved these changes Mar 22, 2021

View reviewed changes

senhuang42 merged commit c48889f into facebook:dev Mar 22, 2021

senhuang42 mentioned this pull request May 11, 2021

🎉 Zstd 1.5.0 Release 🎉 #2636

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add memory monotonicity test over srcSize #2538

Add memory monotonicity test over srcSize #2538

senhuang42 commented Mar 16, 2021 •

edited

Loading

terrelln commented Mar 16, 2021

terrelln commented Mar 16, 2021 •

edited

Loading

senhuang42 commented Mar 16, 2021 •

edited

Loading

senhuang42 commented Mar 16, 2021 •

edited

Loading

terrelln commented Mar 16, 2021

terrelln commented Mar 17, 2021 •

edited

Loading

terrelln commented Mar 17, 2021

terrelln commented Mar 19, 2021

senhuang42 commented Mar 19, 2021 •

edited

Loading

terrelln commented Mar 22, 2021

senhuang42 commented Mar 22, 2021

Add memory monotonicity test over srcSize #2538

Add memory monotonicity test over srcSize #2538

Conversation

senhuang42 commented Mar 16, 2021 • edited Loading

terrelln commented Mar 16, 2021

terrelln commented Mar 16, 2021 • edited Loading

senhuang42 commented Mar 16, 2021 • edited Loading

senhuang42 commented Mar 16, 2021 • edited Loading

terrelln commented Mar 16, 2021

terrelln commented Mar 17, 2021 • edited Loading

terrelln commented Mar 17, 2021

terrelln commented Mar 19, 2021

senhuang42 commented Mar 19, 2021 • edited Loading

terrelln commented Mar 22, 2021

senhuang42 commented Mar 22, 2021

senhuang42 commented Mar 16, 2021 •

edited

Loading

terrelln commented Mar 16, 2021 •

edited

Loading

senhuang42 commented Mar 16, 2021 •

edited

Loading

senhuang42 commented Mar 16, 2021 •

edited

Loading

terrelln commented Mar 17, 2021 •

edited

Loading

senhuang42 commented Mar 19, 2021 •

edited

Loading