Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit default Quality used by BrotliStream for CompressionLevel.Optimal #64185

Closed
7 tasks
adamsitnik opened this issue Jan 24, 2022 · 2 comments · Fixed by #72266
Closed
7 tasks

Revisit default Quality used by BrotliStream for CompressionLevel.Optimal #64185

adamsitnik opened this issue Jan 24, 2022 · 2 comments · Fixed by #72266
Assignees
Labels
area-System.IO.Compression help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented Jan 24, 2022

It has been noted by @stephentoub in #46595 (comment):

The quality values are set as:

        public const int Quality_Min = 0;
        public const int Quality_Default = 11;
        public const int Quality_Max = 11;

With the introduction of SmallestSize (#1549 (comment)) and the rationale employed for it that Optimal should really be a good balance between speed and size, we should define it closer to 6 rather than 11. The docs will also need to be updated.

The person who is willing to work on this issue should:

  • choose a set of metrics used to determine which value is optimal. Preferably compressed output size and time it takes to perform the compression.
  • choose a set of representative input files for the experiment while keeping in mind that .NET users might use this API for compressing all kinds of files.
  • run the experiment for all the files and quality levels, share the results and source code (so others can run the tests for different configs like ARM64).
  • send a PR with code change
  • send a PR with docs change to https://github.com/dotnet/dotnet-api-docs

Nice to have:

  • verify if 22 is the best default value for Window
  • see if there are any places in our code base where we are using Brotli for compression and we know the input size up-front. If so, consider using lower quality for small inputs.

The recommended tool for benchmarking the performance is BenchmarkDotNet. Since getting BDN to print some extra metrics like compressed output size requires some hacks, I am providing a link to source code that shows how to do it: adamsitnik/performance@b2e21df...f0c222b

Sample results:

Method quality window file Mean Size
Compress_WithoutState 4 21 alice29.txt 2,487,362.9 ns 54467
Decompress_WithoutState 4 21 alice29.txt 392,504.2 ns 54467
Compress_WithoutState 4 22 alice29.txt 2,511,040.1 ns 54467
Decompress_WithoutState 4 22 alice29.txt 391,846.0 ns 54467
Compress_WithoutState 4 23 alice29.txt 2,502,379.3 ns 54467
Decompress_WithoutState 4 23 alice29.txt 392,153.3 ns 54467
Compress_WithoutState 5 21 alice29.txt 4,250,425.3 ns 51989
Decompress_WithoutState 5 21 alice29.txt 378,212.0 ns 51989
Compress_WithoutState 5 22 alice29.txt 4,218,672.9 ns 51989
Decompress_WithoutState 5 22 alice29.txt 377,680.4 ns 51989
Compress_WithoutState 5 23 alice29.txt 4,251,750.8 ns 51989
Decompress_WithoutState 5 23 alice29.txt 377,350.3 ns 51989
Compress_WithoutState 6 21 alice29.txt 5,068,046.1 ns 51186
Decompress_WithoutState 6 21 alice29.txt 368,488.6 ns 51186
Compress_WithoutState 6 22 alice29.txt 5,110,149.7 ns 51186
Decompress_WithoutState 6 22 alice29.txt 371,180.5 ns 51186
Compress_WithoutState 6 23 alice29.txt 5,095,344.6 ns 51186
Decompress_WithoutState 6 23 alice29.txt 371,466.2 ns 51186
Compress_WithoutState 7 21 alice29.txt 7,774,881.0 ns 50700
Decompress_WithoutState 7 21 alice29.txt 365,068.9 ns 50700
Compress_WithoutState 7 22 alice29.txt 7,757,002.4 ns 50700
Decompress_WithoutState 7 22 alice29.txt 365,038.5 ns 50700
Compress_WithoutState 7 23 alice29.txt 7,754,481.9 ns 50700
Decompress_WithoutState 7 23 alice29.txt 364,923.5 ns 50700

Please consider using charts to visualize the results of the experiment. You can do that by using RPlotExporter or by getting BDN to export the data to CSV and importing it in Excel (or other tool of your choice)

@adamsitnik adamsitnik added area-System.IO.Compression tenet-performance Performance related issue help wanted [up-for-grabs] Good issue for external contributors labels Jan 24, 2022
@adamsitnik adamsitnik added this to the 7.0.0 milestone Jan 24, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 24, 2022
@ghost
Copy link

ghost commented Jan 24, 2022

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

It has been noted by @stephentoub in #46595 (comment):

The quality values are set as:

        public const int Quality_Min = 0;
        public const int Quality_Default = 11;
        public const int Quality_Max = 11;

With the introduction of SmallestSize (#1549 (comment)) and the rationale employed for it that Optimal should really be a good balance between speed and size, we should define it closer to 6 rather than 11. The docs will also need to be updated.

The person who is willing to work on this issue should:

  • choose a set of metrics used to determine which value is optimal. Preferably compressed output size and time it takes to perform the compression.
  • choose a set of representative input files for the experiment while keeping in mind that .NET users might use this API for compressing all kinds of files.
  • run the experiment for all the files and quality levels, share the results and source code (so others can run the tests for different configs like ARM64).
  • send a PR with code change
  • send a PR with docs change to https://github.com/dotnet/dotnet-api-docs

The recommended tool for benchmarking the performance is BenchmarkDotNet. Since getting BDN to print some extra metrics like compressed output size requires some hacks, I am providing a link to source code that shows how to do it: adamsitnik/performance@b2e21df...f0c222b

Sample results:

Method quality window file Mean Size
Compress_WithoutState 4 21 alice29.txt 2,487,362.9 ns 54467
Decompress_WithoutState 4 21 alice29.txt 392,504.2 ns 54467
Compress_WithoutState 4 22 alice29.txt 2,511,040.1 ns 54467
Decompress_WithoutState 4 22 alice29.txt 391,846.0 ns 54467
Compress_WithoutState 4 23 alice29.txt 2,502,379.3 ns 54467
Decompress_WithoutState 4 23 alice29.txt 392,153.3 ns 54467
Compress_WithoutState 5 21 alice29.txt 4,250,425.3 ns 51989
Decompress_WithoutState 5 21 alice29.txt 378,212.0 ns 51989
Compress_WithoutState 5 22 alice29.txt 4,218,672.9 ns 51989
Decompress_WithoutState 5 22 alice29.txt 377,680.4 ns 51989
Compress_WithoutState 5 23 alice29.txt 4,251,750.8 ns 51989
Decompress_WithoutState 5 23 alice29.txt 377,350.3 ns 51989
Compress_WithoutState 6 21 alice29.txt 5,068,046.1 ns 51186
Decompress_WithoutState 6 21 alice29.txt 368,488.6 ns 51186
Compress_WithoutState 6 22 alice29.txt 5,110,149.7 ns 51186
Decompress_WithoutState 6 22 alice29.txt 371,180.5 ns 51186
Compress_WithoutState 6 23 alice29.txt 5,095,344.6 ns 51186
Decompress_WithoutState 6 23 alice29.txt 371,466.2 ns 51186
Compress_WithoutState 7 21 alice29.txt 7,774,881.0 ns 50700
Decompress_WithoutState 7 21 alice29.txt 365,068.9 ns 50700
Compress_WithoutState 7 22 alice29.txt 7,757,002.4 ns 50700
Decompress_WithoutState 7 22 alice29.txt 365,038.5 ns 50700
Compress_WithoutState 7 23 alice29.txt 7,754,481.9 ns 50700
Decompress_WithoutState 7 23 alice29.txt 364,923.5 ns 50700

Please consider using charts to visualize the results of the experiment. You can do that by using RPlotExporter or by getting BDN to export the data to CSV and importing it in Excel (or other tool of your choice)

Author: adamsitnik
Assignees: -
Labels:

area-System.IO.Compression, tenet-performance, up-for-grabs

Milestone: 7.0.0

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Jan 24, 2022
@EgorBo
Copy link
Member

EgorBo commented Jan 24, 2022

I always thought that there was some special secret meaning under Optimal being the same as Max like "Brotli only makes sense to use with -q11" 😄

@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 9, 2022
@stephentoub stephentoub self-assigned this Jul 15, 2022
@stephentoub stephentoub modified the milestones: Future, 7.0.0 Jul 15, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 15, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 15, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Compression help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants