Add CompressionLevel.SmallestSize #41960

huoyaoyuan · 2020-09-08T10:20:05Z

Closes #1549 .
pal_zlib.c passes it as an integer directly to zlib, so no additional change required?

Dotnet-GitSync-Bot · 2020-09-08T10:20:08Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Dotnet-GitSync-Bot · 2020-09-08T10:20:09Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/ZLibNative.cs

src/libraries/System.IO.Compression/src/System/IO/Compression/CompressionLevel.cs

danmoseley · 2020-09-10T01:10:02Z

Can you please add a test(s) for this new value?

I see our existing tests are not great. For deflate, we apparently don't test passing CompressionLevel at all. For GZip we test CompressionLevel.NoCompression only! It would be nice to improve them a little, the tests could simply compress and decompress and verify the result is the same. That would at least verify that the various settings are accepted and don't fail.

huoyaoyuan · 2020-09-10T04:50:11Z

Can you please add a test(s) for this new value?

I'd want to ask how would I test it. Adding new items to TestData? How would it be produced?

danmoseley · 2020-09-10T05:25:28Z

Why does it need new items in testdata? The tests can compress any existing test data and decompress again. The goal is simply to check that it works with the specified value.

huoyaoyuan · 2020-09-10T09:49:23Z

Why does it need new items in testdata?

If I read everything correctly, the unit test compresses the sample files, and compare them with precompressed result. If we add a new compression level, the compressed results should be added to testdata.

danmoseley · 2020-09-10T11:10:35Z

Right, but given the limited coverage of these flags anyway simply decompressing again would have value and a future PR could update test data to verify the compressed result as well. I can certainly dig up instructions for updating it but figured it could be done separately

ericstj · 2020-09-10T13:52:48Z

There is coverage here and it needs to be updated: https://github.com/dotnet/runtime/blob/master/src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs#L344. Search for CompressionLevel in this directory and you should find multiple places to update.

To update test-data submit a PR here https://github.com/dotnet/runtime-assets/tree/master/src/System.IO.Compression.TestData/GZipTestData. However actually validating binary content of the compression algorithm leads to fragile tests. We don’t guarantee the results are binary identical as we call different zlib implementations on different platforms.

I think it’s fair to assert in a test that the new enum does better on a set of payloads than other flags, since that is its promise. We should add a case to cover this.

carlossanlop · 2020-09-10T22:54:51Z

I think it’s fair to assert in a test that the new enum does better on a set of payloads than other flags, since that is its promise. We should add a case to cover this.

@ericstj would it make sense to iterate through all the CompressionLevel enum values, and compare the resulting sizes of all the files? Meaning that if we rank the files by size, we should get:

NoCompression <-- Largest file
Fastest
Optimal
SmallestSize <-- Smallest file

danmoseley · 2020-09-10T22:59:35Z

We could try - maybe a weaker ordering in that eg Optimal is no smaller than Smallest.

BTW, since nobody is testing the algorithm implementations, I would imagine that you could simply compress a file generated in memory in some creative way (not too much entropy not too little..).

ericstj · 2020-09-10T23:17:58Z

Agree with @danmosemsft -- there's no claims about sizes between optimal and fastest. Those are statements about time it takes and balance of the parameters. SmallestSize is making a claim about size and we better back that up with tests.

huoyaoyuan · 2020-09-11T07:54:23Z

It sounds like we are testing the behavior of zlib which we don't own.

danmoseley · 2020-09-11T14:37:54Z

@huoyaoyuan we have no interest in testing zlib. However we would like to know that “something” sensible happens when we are passed each of these settings. At a minimum that the operation succeeds. Ideally to also verify that we are passing the value through. The size test was an idea for doing that which hopefully would be both simple and stable in the face of any zlib change.

stephentoub

src/ref changes look good. Needs tests as highlighted in other comments. Thanks!

carlossanlop

there's no claims about sizes between optimal and fastest.

I see what you mean, @ericstj. Maybe the tests could do something like this:

Iterate through all the CompressionLevel enum values.
For each enum value, generate a compressed file containing a few text files.
Verify the compressed files were created.
Open the compressed files, open the text files inside them.
Verify that the text files are not corrupted (their contents are the same as the original text files).

danmoseley · 2020-09-14T22:05:17Z

Right, where presumably nothing need hit the disk.

danmoseley · 2020-09-14T22:07:18Z

@dotnet/dnceng is there a way to figure out why we got The "WaitForHelixJobCompletion" task returned false but did not log an error. ? I would expect a hung test to time out and give a log.
https://dev.azure.com/dnceng/public/_build/results?buildId=808983&view=logs&jobId=694d544e-ff71-5faf-b01a-5137c04e57c6&j=694d544e-ff71-5faf-b01a-5137c04e57c6&t=ae305e20-d07e-5652-59a0-399e9617bb30

MattGal · 2020-09-14T22:09:06Z

@dotnet/dnceng is there a way to figure out why we got The "WaitForHelixJobCompletion" task returned false but did not log an error. ? I would expect a hung test to time out and give a log.
https://dev.azure.com/dnceng/public/_build/results?buildId=808983&view=logs&jobId=694d544e-ff71-5faf-b01a-5137c04e57c6&j=694d544e-ff71-5faf-b01a-5137c04e57c6&t=ae305e20-d07e-5652-59a0-399e9617bb30

This returned false because the job (and therefore the execution of the "WaitForHelixJobCompletion" MSBuild task) was cancelled. It did not log an error because it had not encountered an error by the time this occurred. I'll also poke at your jobs to see if the timeout was unusual because 2.5 hours is a long time.

danmoseley · 2020-09-14T22:13:03Z

Right, I'm just wondering how we might tell why it ran so slow (or hung)

MattGal · 2020-09-14T22:15:14Z

Right, I'm just wondering how we might tell why it ran so slow (or hung)

It's definitely something going catastrophically, machine-tearing-down bad with System.Threading.Tests in the windows.10.amd64.server19h1.es.open run but it's just so bad I haven't got anything useful to say about the whats and the whys yet.

MattGal · 2020-09-14T23:41:31Z

@danmosemsft the best guess I have to this is:

Something messed up memory management on the machine in general, and it hung while peek-locking the work item ad-infinitum, only managing to send one telemetry entry: ("Only part of a ReadProcessMemory or WriteProcessMemory request was completed")

[WinError 299] Solo se completó una parte de una solicitud ReadProcessMemory o WriteProcessMemory: '(originated from ReadProcessMemory)'

This work item locked up the whole machine for this this time, but eventually either the vm was deleted or rebooted by the "sick vm cleaner".
Once it managed to start again, the event hub telemetry SAS token it had was so old it was expired, so no further events got sent ('Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature')

All sorts of funky workloads get run on these machines so I don't necessarily think System.Threading caused a memory access panic, it just was the work item running when something so catastrophic it broke python 3's memory management happened.

danmoseley · 2020-09-15T00:01:55Z

How curious @MattGal . OK it sounds like watchful waiting to see whether we see that again unless you see some obvious step we should take now that would give more info next time.

huoyaoyuan · 2020-09-21T08:31:36Z

Test added for validation of the enum member. Does this PR still need more tests? Comparing sizes may be added as an enhancement.

Co-authored-by: Dan Moseley <danmose@microsoft.com>

…reamUnitTestBase.cs

stephentoub · 2020-10-23T13:46:35Z

thank you! I'll rerun those.

@danmosemsft, FYI, unless things have changed recently, re-running via devops doesn't rebase on top of changes that have come in since the last commit, so rerunning in that manner doesn't help to pick up fixes that have since been merged. I sync'd the change and pushed a rebase.

danmoseley · 2020-10-23T15:54:03Z

@stephentoub ah yes -- does close/reopen rebase, though? (@MattGal ?)

MattGal · 2020-10-23T17:13:07Z

@stephentoub ah yes -- does close/reopen rebase, though? (@MattGal ?)

Sorry I don't know the answer here, someone else on @dnceng might.

danmoseley · 2020-10-23T21:48:28Z

Let's see!

danmoseley · 2020-10-26T18:57:49Z

OK, it's mergeable. @carlossanlop could you or someone please review?

carlossanlop

Left a comment for the unit test.

carlossanlop · 2020-11-13T17:14:28Z

src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs

+                return mms.Length;
+            }
+
+            long noCompressionLength = await GetLengthAsync(CompressionLevel.NoCompression);


From this comment:#41960 (comment)

We cannot guarantee size order. This test should instead just verify that the contents of the generated compressed files are not corrupt.

@danmosemsft any agreement please?

It does seem likely though that size levels increase monotonically -- that would otherwise break my expectations -- my suggestion is to keep the test as is, and if it breaks at some future point, we can just loosen the test then. does that seem reasonable @carlossanlop ?

Fine by me. @ericstj ?

Sounds reasonable.

carlossanlop

LGTM, but would also like to wait on @ericstj opinion in comment before merging.

danmoseley · 2020-12-06T22:42:04Z

@ericstj is this OK to merge?

ericstj · 2020-12-07T20:13:33Z

/azp run runtime

azure-pipelines · 2020-12-07T20:13:58Z

Azure Pipelines successfully started running 1 pipeline(s).

ghost · 2020-12-07T20:15:21Z

Hello @ericstj!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

Dotnet-GitSync-Bot added the new-api-needs-documentation label Sep 8, 2020

am11 reviewed Sep 8, 2020

View reviewed changes

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/ZLibNative.cs Outdated Show resolved Hide resolved

jkotas added the area-System.IO.Compression label Sep 8, 2020

danmoseley reviewed Sep 10, 2020

View reviewed changes

src/libraries/System.IO.Compression/src/System/IO/Compression/CompressionLevel.cs Outdated Show resolved Hide resolved

danmoseley requested review from carlossanlop and ericstj September 10, 2020 01:10

stephentoub reviewed Sep 14, 2020

View reviewed changes

carlossanlop reviewed Sep 14, 2020

View reviewed changes

JimBobSquarePants mentioned this pull request Sep 25, 2020

Add support for zlib data format (RFC 1950) #2236

Closed

huoyaoyuan and others added 7 commits October 23, 2020 09:46

Update xmldoc for CompressionLevel.

581a2a4

Fix xml doc on ZLibNative.CompressionLevel.

0355bd1

Apply suggestion from review

a578158

Co-authored-by: Dan Moseley <danmose@microsoft.com>

Add SmallestSize to tests.

4582bf0

Add test for compression level and sizes.

7326684

Update src/libraries/Common/tests/System/IO/Compression/CompressionSt…

d3e8749

…reamUnitTestBase.cs

Brotli

1b990d1

stephentoub force-pushed the compression-level-smallest branch from 3b71109 to 1b990d1 Compare October 23, 2020 13:46

danmoseley closed this Oct 23, 2020

danmoseley reopened this Oct 23, 2020

Merge branch 'master' into compression-level-smallest

6b2ec56

huoyaoyuan force-pushed the compression-level-smallest branch from 5188ab9 to 0476a85 Compare November 13, 2020 12:56

Re-add size in order test.

d3adc38

huoyaoyuan force-pushed the compression-level-smallest branch from 0476a85 to d3adc38 Compare November 13, 2020 12:57

carlossanlop reviewed Nov 13, 2020

View reviewed changes

carlossanlop approved these changes Nov 16, 2020

View reviewed changes

ericstj approved these changes Dec 7, 2020

View reviewed changes

ericstj added the auto-merge label Dec 7, 2020

ghost merged commit 7d527c3 into dotnet:master Dec 7, 2020

huoyaoyuan deleted the compression-level-smallest branch December 8, 2020 06:27

ghost locked as resolved and limited conversation to collaborators Jan 7, 2021

This pull request was closed.

Add CompressionLevel.SmallestSize #41960

Add CompressionLevel.SmallestSize #41960

Conversation

huoyaoyuan commented Sep 8, 2020

Dotnet-GitSync-Bot commented Sep 8, 2020

Dotnet-GitSync-Bot commented Sep 8, 2020

danmoseley commented Sep 10, 2020

huoyaoyuan commented Sep 10, 2020

danmoseley commented Sep 10, 2020 • edited Loading

huoyaoyuan commented Sep 10, 2020

danmoseley commented Sep 10, 2020

ericstj commented Sep 10, 2020

carlossanlop commented Sep 10, 2020

danmoseley commented Sep 10, 2020

ericstj commented Sep 10, 2020

huoyaoyuan commented Sep 11, 2020

danmoseley commented Sep 11, 2020

stephentoub left a comment

Choose a reason for hiding this comment

carlossanlop left a comment

Choose a reason for hiding this comment

danmoseley commented Sep 14, 2020

danmoseley commented Sep 14, 2020

MattGal commented Sep 14, 2020 • edited Loading

danmoseley commented Sep 14, 2020

MattGal commented Sep 14, 2020

MattGal commented Sep 14, 2020

danmoseley commented Sep 15, 2020

huoyaoyuan commented Sep 21, 2020

stephentoub commented Oct 23, 2020

danmoseley commented Oct 23, 2020

MattGal commented Oct 23, 2020

danmoseley commented Oct 23, 2020

danmoseley commented Oct 26, 2020

carlossanlop left a comment

Choose a reason for hiding this comment

carlossanlop Nov 13, 2020

Choose a reason for hiding this comment

huoyaoyuan Nov 13, 2020

Choose a reason for hiding this comment

danmoseley Nov 13, 2020

Choose a reason for hiding this comment

carlossanlop Nov 16, 2020

Choose a reason for hiding this comment

ericstj Dec 7, 2020

Choose a reason for hiding this comment

carlossanlop left a comment

Choose a reason for hiding this comment

danmoseley commented Dec 6, 2020

ericstj commented Dec 7, 2020

azure-pipelines bot commented Dec 7, 2020

ghost commented Dec 7, 2020

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

danmoseley commented Sep 10, 2020 •

edited

Loading

MattGal commented Sep 14, 2020 •

edited

Loading

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.