Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CompressionLevel.SmallestSize #41960

Merged
10 commits merged into from
Dec 7, 2020

Conversation

huoyaoyuan
Copy link
Member

Closes #1549 .
pal_zlib.c passes it as an integer directly to zlib, so no additional change required?

@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@danmoseley
Copy link
Member

Can you please add a test(s) for this new value?

I see our existing tests are not great. For deflate, we apparently don't test passing CompressionLevel at all. For GZip we test CompressionLevel.NoCompression only! It would be nice to improve them a little, the tests could simply compress and decompress and verify the result is the same. That would at least verify that the various settings are accepted and don't fail.

@huoyaoyuan
Copy link
Member Author

Can you please add a test(s) for this new value?

I'd want to ask how would I test it. Adding new items to TestData? How would it be produced?

@danmoseley
Copy link
Member

danmoseley commented Sep 10, 2020

Why does it need new items in testdata? The tests can compress any existing test data and decompress again. The goal is simply to check that it works with the specified value.

@huoyaoyuan
Copy link
Member Author

Why does it need new items in testdata?

If I read everything correctly, the unit test compresses the sample files, and compare them with precompressed result. If we add a new compression level, the compressed results should be added to testdata.

@danmoseley
Copy link
Member

Right, but given the limited coverage of these flags anyway simply decompressing again would have value and a future PR could update test data to verify the compressed result as well. I can certainly dig up instructions for updating it but figured it could be done separately

@ericstj
Copy link
Member

ericstj commented Sep 10, 2020

There is coverage here and it needs to be updated: https://github.com/dotnet/runtime/blob/master/src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs#L344. Search for CompressionLevel in this directory and you should find multiple places to update.

To update test-data submit a PR here https://github.com/dotnet/runtime-assets/tree/master/src/System.IO.Compression.TestData/GZipTestData. However actually validating binary content of the compression algorithm leads to fragile tests. We don’t guarantee the results are binary identical as we call different zlib implementations on different platforms.

I think it’s fair to assert in a test that the new enum does better on a set of payloads than other flags, since that is its promise. We should add a case to cover this.

@carlossanlop
Copy link
Member

I think it’s fair to assert in a test that the new enum does better on a set of payloads than other flags, since that is its promise. We should add a case to cover this.

@ericstj would it make sense to iterate through all the CompressionLevel enum values, and compare the resulting sizes of all the files? Meaning that if we rank the files by size, we should get:

  • NoCompression <-- Largest file
  • Fastest
  • Optimal
  • SmallestSize <-- Smallest file

@danmoseley
Copy link
Member

We could try - maybe a weaker ordering in that eg Optimal is no smaller than Smallest.

BTW, since nobody is testing the algorithm implementations, I would imagine that you could simply compress a file generated in memory in some creative way (not too much entropy not too little..).

@ericstj
Copy link
Member

ericstj commented Sep 10, 2020

Agree with @danmosemsft -- there's no claims about sizes between optimal and fastest. Those are statements about time it takes and balance of the parameters. SmallestSize is making a claim about size and we better back that up with tests.

@huoyaoyuan
Copy link
Member Author

It sounds like we are testing the behavior of zlib which we don't own.

@danmoseley
Copy link
Member

@huoyaoyuan we have no interest in testing zlib. However we would like to know that “something” sensible happens when we are passed each of these settings. At a minimum that the operation succeeds. Ideally to also verify that we are passing the value through. The size test was an idea for doing that which hopefully would be both simple and stable in the face of any zlib change.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/ref changes look good. Needs tests as highlighted in other comments. Thanks!

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no claims about sizes between optimal and fastest.

I see what you mean, @ericstj. Maybe the tests could do something like this:

  • Iterate through all the CompressionLevel enum values.
  • For each enum value, generate a compressed file containing a few text files.
  • Verify the compressed files were created.
  • Open the compressed files, open the text files inside them.
  • Verify that the text files are not corrupted (their contents are the same as the original text files).

@danmoseley
Copy link
Member

Right, where presumably nothing need hit the disk.

@danmoseley
Copy link
Member

@dotnet/dnceng is there a way to figure out why we got The "WaitForHelixJobCompletion" task returned false but did not log an error. ? I would expect a hung test to time out and give a log.
https://dev.azure.com/dnceng/public/_build/results?buildId=808983&view=logs&jobId=694d544e-ff71-5faf-b01a-5137c04e57c6&j=694d544e-ff71-5faf-b01a-5137c04e57c6&t=ae305e20-d07e-5652-59a0-399e9617bb30

@MattGal
Copy link
Member

MattGal commented Sep 14, 2020

@dotnet/dnceng is there a way to figure out why we got The "WaitForHelixJobCompletion" task returned false but did not log an error. ? I would expect a hung test to time out and give a log.
https://dev.azure.com/dnceng/public/_build/results?buildId=808983&view=logs&jobId=694d544e-ff71-5faf-b01a-5137c04e57c6&j=694d544e-ff71-5faf-b01a-5137c04e57c6&t=ae305e20-d07e-5652-59a0-399e9617bb30

This returned false because the job (and therefore the execution of the "WaitForHelixJobCompletion" MSBuild task) was cancelled. It did not log an error because it had not encountered an error by the time this occurred. I'll also poke at your jobs to see if the timeout was unusual because 2.5 hours is a long time.

@danmoseley
Copy link
Member

Right, I'm just wondering how we might tell why it ran so slow (or hung)

@MattGal
Copy link
Member

MattGal commented Sep 14, 2020

Right, I'm just wondering how we might tell why it ran so slow (or hung)

It's definitely something going catastrophically, machine-tearing-down bad with System.Threading.Tests in the windows.10.amd64.server19h1.es.open run but it's just so bad I haven't got anything useful to say about the whats and the whys yet.

@MattGal
Copy link
Member

MattGal commented Sep 14, 2020

@danmosemsft the best guess I have to this is:

  • Something messed up memory management on the machine in general, and it hung while peek-locking the work item ad-infinitum, only managing to send one telemetry entry: ("Only part of a ReadProcessMemory or WriteProcessMemory request was completed")
[WinError 299] Solo se completó una parte de una solicitud ReadProcessMemory o WriteProcessMemory: '(originated from ReadProcessMemory)'
  • This work item locked up the whole machine for this this time, but eventually either the vm was deleted or rebooted by the "sick vm cleaner".
  • Once it managed to start again, the event hub telemetry SAS token it had was so old it was expired, so no further events got sent ('Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature')

All sorts of funky workloads get run on these machines so I don't necessarily think System.Threading caused a memory access panic, it just was the work item running when something so catastrophic it broke python 3's memory management happened.

@danmoseley
Copy link
Member

How curious @MattGal . OK it sounds like watchful waiting to see whether we see that again unless you see some obvious step we should take now that would give more info next time.

@huoyaoyuan
Copy link
Member Author

Test added for validation of the enum member. Does this PR still need more tests? Comparing sizes may be added as an enhancement.

@stephentoub
Copy link
Member

thank you! I'll rerun those.

@danmosemsft, FYI, unless things have changed recently, re-running via devops doesn't rebase on top of changes that have come in since the last commit, so rerunning in that manner doesn't help to pick up fixes that have since been merged. I sync'd the change and pushed a rebase.

@danmoseley
Copy link
Member

@stephentoub ah yes -- does close/reopen rebase, though? (@MattGal ?)

@MattGal
Copy link
Member

MattGal commented Oct 23, 2020

@stephentoub ah yes -- does close/reopen rebase, though? (@MattGal ?)

Sorry I don't know the answer here, someone else on @dnceng might.

@danmoseley danmoseley closed this Oct 23, 2020
@danmoseley danmoseley reopened this Oct 23, 2020
@danmoseley
Copy link
Member

Let's see!

@danmoseley
Copy link
Member

OK, it's mergeable. @carlossanlop could you or someone please review?

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment for the unit test.

return mms.Length;
}

long noCompressionLength = await GetLengthAsync(CompressionLevel.NoCompression);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this comment:#41960 (comment)

We cannot guarantee size order. This test should instead just verify that the contents of the generated compressed files are not corrupt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danmosemsft any agreement please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem likely though that size levels increase monotonically -- that would otherwise break my expectations -- my suggestion is to keep the test as is, and if it breaks at some future point, we can just loosen the test then. does that seem reasonable @carlossanlop ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me. @ericstj ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable.

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would also like to wait on @ericstj opinion in comment before merging.

@danmoseley
Copy link
Member

@ericstj is this OK to merge?

@ericstj
Copy link
Member

ericstj commented Dec 7, 2020

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ghost
Copy link

ghost commented Dec 7, 2020

Hello @ericstj!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 7d527c3 into dotnet:master Dec 7, 2020
@huoyaoyuan huoyaoyuan deleted the compression-level-smallest branch December 8, 2020 06:27
@ghost ghost locked as resolved and limited conversation to collaborators Jan 7, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API request : CompressionLevel.Highest