Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Brotli Optimal CompressionLevel value #57440

Closed
wants to merge 2 commits into from

Conversation

i3arnon
Copy link
Contributor

@i3arnon i3arnon commented Aug 15, 2021

Fix #46595

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 15, 2021
@ghost
Copy link

ghost commented Aug 15, 2021

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Fix #46595

Author: i3arnon
Assignees: -
Labels:

area-System.IO.Compression

Milestone: -

@adamsitnik
Copy link
Member

@stephentoub since this is a breaking change I am moving it to 7.0.0

@adamsitnik adamsitnik modified the milestones: 6.0.0, 7.0.0 Aug 16, 2021
@stephentoub
Copy link
Member

since this is a breaking change

There are two changes here:

  1. Changing the speed/size ratio for Optimal. That's not breaking.
  2. Validating the argument to be [0,11]. a) That's not a breaking change because it's already done by the underlying encoder.SetQuality call on the very next line, and b) I don't believe that addresses the crux of my concern in the linked issue.

We can delete (2) from this PR as it doesn't do anything. (1) can be considered for 6.

@i3arnon
Copy link
Contributor Author

i3arnon commented Aug 17, 2021

I removed the validation and kept the Optimal ratio change.
I'm intentionally keeping the title the same to avoid breaking email threads.

I opened another PR to validate the CompresstionLevel argument to be a defined value: #57561

@carlossanlop
Copy link
Member

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@carlossanlop carlossanlop reopened this Oct 5, 2021
Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks solid, just want to make sure the CI passes, or if there are unit tests we need to adjust.

@jeffhandley
Copy link
Member

Change looks solid, just want to make sure the CI passes, or if there are unit tests we need to adjust.

@carlossanlop CI is green. Apparently no tests rely on the specific value. Is there anything else you wanted to confirm before we approve and merge this?

@stephentoub
Copy link
Member

stephentoub commented Oct 12, 2021

Could we re-validate this has the expected impact in terms of a good default balance between size and speed on various inputs?

@carlossanlop carlossanlop changed the title Validate CompressionLevel in BrotliStream ctor Change Brotli Optimal CompressionLevel value Oct 16, 2021
@carlossanlop
Copy link
Member

Could we re-validate this has the expected impact in terms of a good default balance between size and speed on various inputs?

Sure!

6 is a much better value than 11, but it's not the optimal value.

I tested all the Google test files against values from 4 to 8, and concluded that a CompressionLevel of 5 is the value that offers the best balance between compression and speed.

10x10y

Winners: 5, 6

  • Both have the same compression as the current optimal (11).
  • Both are 1 ms faster than current optimal (11) (it was instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 20 22 110 0
CL.NoCompression (0) => 2 20 24 120 0
Two below suggested => 4 20 12 60 10
One below suggested => 5 20 12 60 0
Suggested optimal => 6 20 12 60 0
One above suggested => 7 20 12 60 2
Two above suggested => 8 20 12 60 4
Current optimal => 11 20 12 60 1
64x

Winners: 4, 5, 6

  • Better compression than current.
  • 1 ms faster than current (instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 64 19 29.6875 0
CL.NoCompression (0) => 2 64 66 103.125 0
Two below suggested => 4 64 10 15.625 0
One below suggested => 5 64 10 15.625 0
Suggested optimal => 6 64 10 15.625 0
One above suggested => 7 64 10 15.625 2
Two above suggested => 8 64 10 15.625 4
Current optimal => 11 64 11 17.1875 1
backward65536

Winners: 4, 5, 6

  • Better compression than current.
  • 32 ms faster than current.
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 65792 31 0.0471 0
CL.NoCompression (0) => 2 65792 93 0.1414 0
Two below suggested => 4 65792 19 0.0289 2
One below suggested => 5 65792 19 0.0289 2
Suggested optimal => 6 65792 19 0.0289 2
One above suggested => 7 65792 19 0.0289 4
Two above suggested => 8 65792 19 0.0289 6
Current optimal => 11 65792 20 0.0304 34
compressed_file

Winners: 4, 5

  • Same compression as current.
  • 43 ms faster than current (instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 50096 50100 100.008 0
CL.NoCompression (0) => 2 50096 50100 100.008 0
Two below suggested => 4 50096 50100 100.008 0
One below suggested => 5 50096 50100 100.008 0
Suggested optimal => 6 50096 50100 100.008 1
One above suggested => 7 50096 50100 100.008 2
Two above suggested => 8 50096 50100 100.008 4
Current optimal => 11 50096 50100 100.008 43
compressed_repeated

Winners: 4, 5, 6

  • 0.2% worse compression than current.
  • 322 ms faster than current.
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 144224 51342 35.5988 1
CL.NoCompression (0) => 2 144224 52582 36.4586 0
Two below suggested => 4 144224 50445 34.9768 4
One below suggested => 5 144224 50445 34.9768 4
Suggested optimal => 6 144224 50445 34.9768 4
One above suggested => 7 144224 50445 34.9768 6
Two above suggested => 8 144224 50445 34.9768 8
Current optimal => 11 144224 50156 34.7765 326
empty

Winners: 4, 5, 6

  • Same compression as current.
  • Same time as current.
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 0 1 0
CL.NoCompression (0) => 2 0 1 0
Two below suggested => 4 0 1 0
One below suggested => 5 0 1 0
Suggested optimal => 6 0 1 0
One above suggested => 7 0 1 2
Two above suggested => 8 0 1 3
Current optimal => 11 0 1 0
mapsdatazrh

Winners: 4

  • 1.7% worse compression than current.
  • 1652 ms faster than current.
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 285886 181028 63.3217 5
CL.NoCompression (0) => 2 285886 187191 65.4775 3
Two below suggested => 4 285886 172914 60.4835 16
One below suggested => 5 285886 167912 58.7339 25
Suggested optimal => 6 285886 167828 58.7045 29
One above suggested => 7 285886 167591 58.6216 31
Two above suggested => 8 285886 167602 58.6255 38
Current optimal => 11 285886 159339 55.7352 1668
monkey

Winners: 5, 6

  • 2.4% worse compression than current.
  • 4 ms faster than current (instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 843 464 55.0415 0
CL.NoCompression (0) => 2 843 535 63.4638 0
Two below suggested => 4 843 447 53.0249 0
One below suggested => 5 843 426 50.5338 0
Suggested optimal => 6 843 426 50.5338 0
One above suggested => 7 843 426 50.5338 2
Two above suggested => 8 843 426 50.5338 3
Current optimal => 11 843 405 48.0427 4
quickfox

Winners: 5, 6

  • 0.6% worse compression than current.
  • 2 ms faster than current (instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 43 47 109.3023 0
CL.NoCompression (0) => 2 43 47 109.3023 0
Two below suggested => 4 43 47 109.3023 0
One below suggested => 5 43 46 106.9767 0
Suggested optimal => 6 43 46 106.9767 0
One above suggested => 7 43 46 106.9767 1
Two above suggested => 8 43 46 106.9767 3
Current optimal => 11 43 47 109.3023 2
quickfox_repeated

Winners: 5

  • 0.003% better compression than current.
  • 31 ms faster than current.
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 176128 89 0.0505 1
CL.NoCompression (0) => 2 176128 203 0.1153 0
Two below suggested => 4 176128 52 0.0295 2
One below suggested => 5 176128 51 0.029 2
Suggested optimal => 6 176128 51 0.029 3
One above suggested => 7 176128 51 0.029 4
Two above suggested => 8 176128 51 0.029 6
Current optimal => 11 176128 57 0.0324 33
random_org_10k.bin

Winners: 4, 5, 6

  • Same compression as current.
  • 40 ms faster than current (instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 10000 10004 100.04 0
CL.NoCompression (0) => 2 10000 10004 100.04 0
Two below suggested => 4 10000 10004 100.04 0
One below suggested => 5 10000 10004 100.04 0
Suggested optimal => 6 10000 10004 100.04 0
One above suggested => 7 10000 10004 100.04 2
Two above suggested => 8 10000 10004 100.04 3
Current optimal => 11 10000 10004 100.04 40
x

Winners: current, 4, 5, 6

  • Same compression as current.
  • Same time as current.
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 1 5 500 0
CL.NoCompression (0) => 2 1 5 500 0
Two below suggested => 4 1 5 500 0
One below suggested => 5 1 5 500 0
Suggested optimal => 6 1 5 500 0
One above suggested => 7 1 5 500 1
Two above suggested => 8 1 5 500 3
Current optimal => 11 1 5 500 0
ukkonooa

Winners: 4

  • 16% better compression than current.
  • 2 ms faster than current (instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 119 84 70.5882 0
CL.NoCompression (0) => 2 119 123 103.3613 0
Two below suggested => 4 119 62 52.1008 0
One below suggested => 5 119 69 57.9832 0
Suggested optimal => 6 119 69 57.9832 0
One above suggested => 7 119 71 59.6639 1
Two above suggested => 8 119 71 59.6639 3
Current optimal => 11 119 81 68.0672 2
xyzzy

Winners: 4, 5, 6

  • Same compression as current.
  • 1 ms faster than current (instant).
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 5 9 180 0
CL.NoCompression (0) => 2 5 9 180 0
Two below suggested => 4 5 9 180 0
One below suggested => 5 5 9 180 0
Suggested optimal => 6 5 9 180 0
One above suggested => 7 5 9 180 1
Two above suggested => 8 5 9 180 3
Current optimal => 11 5 9 180 1
zeros

Winners: 5, 6

  • .0003% better compression than current.
  • 97 ms faster than current.
CompressionLevel Uncomp. Length (B) Comp. Length (B) Diff. Size (%) Time (ms)
CL.Fastest (1) => 1 262144 47 0.0179 1
CL.NoCompression (0) => 2 262144 196 0.0748 0
Two below suggested => 4 262144 13 0.005 4
One below suggested => 5 262144 13 0.005 3
Suggested optimal => 6 262144 13 0.005 3
One above suggested => 7 262144 13 0.005 4
Two above suggested => 8 262144 13 0.005 7
Current optimal => 11 262144 14 0.0053 100

Final results

CompressionLevel 4 5 6 7 8
Total 10 13 11 0 0

Here's the code I used to generate the above tables. I ran it as a unit test inside BrotliGoogleTestData.cs.

private static IEnumerable<string> GetGoogleTestPaths()
{
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "10x10y");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "64x");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "backward65536");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "compressed_file");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "compressed_repeated");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "empty");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "mapsdatazrh");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "monkey");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "quickfox");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "quickfox_repeated");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "random_org_10k.bin");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "x");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "ukkonooa");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "xyzzy");
  yield return Path.Combine("UncompressedTestFiles", "GoogleTestData", "zeros");
}

private static string GetLevel(int level)
{
  return level switch
  {
      0 => "CL.Optimal (11) => 0",
      1 => "CL.Fastest (1) => 1",
      2 => "CL.NoCompression (0) => 2",
      3 => "CL.SmallestSize (11) => 3",
      4 => "Two below suggested => 4",
      5 => "One below suggested => 5",
      6 => "Suggested optimal => 6",
      7 => "One above suggested => 7",
      8 => "Two above suggested => 8",
      11 => "Current optimal => 11",
      _ => throw new Exception()
  };
}

private int[] Levels = new[] {
  0, // CompressionLevel.Optimal (11)
  1, // CompressionLevel.Fastest (1)
  2, // CompressionLevel.NoCompression (0)
  3, // CompressionLevel.SmallestSize (11)
  4, // Two below suggested optimal
  5, // One below suggested optimal
  6, // Suggested optimal
  7, // One above suggested optimal
  8, // Two above suggested optimal
  11 // Current optimal (should have same result as 0)
};

private const int Pad = 25;

private string[] Headers = new[] {
  "File".PadLeft(Pad),
  "CompressionLevel".PadLeft(Pad),
  "Uncomp. Length (B)".PadLeft(Pad),
  "Comp. Length (B)".PadLeft(Pad),
  "Diff. Size (%)".PadLeft(Pad),
  "Time (ms)".PadLeft(Pad)
};

[Fact]
public void Test()
{
  Stopwatch sw = new Stopwatch();

  Console.WriteLine("|{0}|", string.Join('|',Headers));
  Console.WriteLine("|{0}|", string.Join('|', Enumerable.Repeat<string>(new string('-', Pad), Headers.Length)));

  long compressedLength;
  foreach (string filePath in GetGoogleTestPaths())
  {
      byte[] bytes = File.ReadAllBytes(filePath);
      foreach (int level in Levels)
      {
          using (var memoryStream = new MemoryStream())
          {
              sw.Restart();
              using (var brotliStream = new BrotliStream(memoryStream, (CompressionLevel)level, leaveOpen: true))
              {
                  brotliStream.Write(bytes, 0, bytes.Length);
              }
              sw.Stop();

              memoryStream.Position = 0;
              compressedLength = memoryStream.Length;
              memoryStream.Dispose();
          }

          double percent = Math.Round(compressedLength * 100.0 / bytes.Length, 4);

          var cols = new string[] {
              $"{Path.GetFileName(filePath), Pad}",
              $"{GetLevel(level), Pad}",
              $"{bytes.Length, Pad}",
              $"{compressedLength, Pad}",
              $"{percent, Pad}",
              $"{sw.ElapsedMilliseconds, Pad}"
          };
          Console.WriteLine("|{0}|", string.Join('|', cols));
      }
  }
}

@carlossanlop
Copy link
Member

@i3arnon @jozkee @adamsitnik @stephentoub are the above results good enough to decide using 5 as Optimal?

@adamsitnik
Copy link
Member

are the above results good enough to decide using 5 as Optimal?

I have two concerns:

  • for some of the test cases, the reported time is 0 ms. I would expect that it means that they are instant, while most probably they are not. Have you tried to write the benchmarks using BDN? (I know that it's currently very hard to add some user defined metrics using it)
  • some of the test cases are edge cases. Example: empty content, pre-compressed content etc. It's important to include them in the comparison, but we should introduce some kind of weight for each test cases where "typical" use cases are the most important. This would give me more confidence for changing the default value.

@AraHaan
Copy link
Member

AraHaan commented Nov 30, 2021

Could we re-validate this has the expected impact in terms of a good default balance between size and speed on various inputs?

Sure!

6 is a much better value than 11, but it's not the optimal value.

I tested all the Google test files against values from 4 to 8, and concluded that a CompressionLevel of 5 is the value that offers the best balance between compression and speed.
10x10y
64x
backward65536
compressed_file
compressed_repeated
empty
mapsdatazrh
monkey
quickfox
quickfox_repeated
random_org_10k.bin
x
ukkonooa
xyzzy
zeros

Final results

CompressionLevel 4 5 6 7 8
Total 10 13 11 0 0

Here's the code I used to generate the above tables. I ran it as a unit test inside BrotliGoogleTestData.cs.

Nice, before I looked at this comment, I made an BrotliOptions type that then I harcoded to level 5 being the optimal value 👌.

As that change will be a breaking change, I feel that this one should be merged first.

Note: my changes deletes that function as the range checking was moved to the BrotliOptions type I made.

@@ -9,7 +9,7 @@ internal static partial class BrotliUtils
public const int WindowBits_Default = 22;
public const int WindowBits_Max = 24;
public const int Quality_Min = 0;
public const int Quality_Default = 11;
public const int Quality_Default = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public const int Quality_Default = 6;
public const int Quality_Default = 5;

@adamsitnik
Copy link
Member

I've created a new issue and described everything that needs to be done to change the default value: #64185 (comment)

Since this PR is not complete (lack of good data that explains the decision), I am going to close it. Thanks

@adamsitnik adamsitnik closed this Jan 24, 2022
@i3arnon i3arnon deleted the brotli-compression-level branch January 25, 2022 13:23
@ghost ghost locked as resolved and limited conversation to collaborators Feb 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Compression community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit BrotliStream CompressionLevel values
6 participants