Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compression ratio inefficiency #1709

Merged
merged 12 commits into from
Aug 5, 2019
Merged

Fix compression ratio inefficiency #1709

merged 12 commits into from
Aug 5, 2019

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Aug 1, 2019

Fix compression ratio inefficiency
introduced in #1624
under following scenario :

  • dictionary compression
  • at levels 1 to 3
  • with medium size payloads (larger than attachDict threshold)

The patch restore compression ratio to previous level.

A new compression efficiency test is added into fuzzer/unitTests in order to control gross negative variations in compression ratio, playing all levels with normal compression and flat dictionary compression.

to detect gross CR variations after a patch.

Tests normal and dictionary compression.
invalid on void ptr
@terrelln
Copy link
Contributor

terrelln commented Aug 1, 2019

Do you want to fix the rest of the strategies in this diff as well? The repro that exposes the bad behavior for every level is:

> head -c 40000 /dev/urandom > rand
> cp rand dict
> head -c 40000 /dev/urandom >> dict
> ./zstd -$LEVEL rand -D dict -o /dev/null
rand                 :100.04%   ( 40000 =>  40014 bytes, /dev/null)

@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Aug 1, 2019

Yes, I will

restore dictionary compression ratio
to reduce chances of differences between 32 and 64-bit fuzzer tests
@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Aug 2, 2019

All strategies fixed, able to access the entire dictionary as long as at least one byte is within window size.

Still need to understand why the new size-control test fails on gcc-6 + 32-bit.

Also worthwhile to note :
the usan 32-bit build complains about integer overflow issues when reaching overflow protection tests for large windowLog.
It's a completely separate issue, but might be worth looking into (as part of another PR).

to produce same content on both 32 and 64-bit platforms
by removing floating from literal table determination.

also : added checksum trace in compression control test,
so that it's easier to determine if test fails
as a consequence of compressing a different sample.
@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Aug 2, 2019

The failing test due to lack of reproducibility between 32 and 64-bit builds is now fixed.

And now, another completely unrelated test fails later in the serie.
As one can already guess, this test wasn't modified by a single line.

Great outcome, such a joy ....

just for clarity, for the currently failing unit test
@terrelln
Copy link
Contributor

terrelln commented Aug 2, 2019

As one can already guess, this test wasn't modified by a single line.

I'd love to switch these tests to gtest, given the time.

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, but I'll go ahead and accept so you can merge when its fixed.

U32 const lowestValid = ms->window.lowLimit;
U32 const withinWindow = (current - lowestValid > maxDistance) ? current - maxDistance : lowestValid;
U32 const isDictionary = (ms->loadedDictEnd != 0);
U32 const windowLow = isDictionary ? lowestValid : withinWindow;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only variable you need for the rest of the function is windowLow right?

Can you extract this out into a function since it is duplicated in every match finder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a good point

it's an unsupported scenario.
@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Aug 5, 2019

Last item completed, with windowLow logic factored into zstd_compress_internal.h .

@Cyan4973 Cyan4973 merged commit b2e71fa into dev Aug 5, 2019
@felixhandte felixhandte mentioned this pull request Aug 19, 2019
@Cyan4973 Cyan4973 deleted the fix1624 branch August 27, 2019 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants