Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxing Cache? #7079

Closed
benaadams opened this issue Dec 2, 2016 · 17 comments
Closed

Boxing Cache? #7079

benaadams opened this issue Dec 2, 2016 · 17 comments
Labels
area-VM-coreclr design-discussion Ongoing discussion about design without consensus
Milestone

Comments

@benaadams
Copy link
Member

Revisiting https://github.com/dotnet/coreclr/issues/111

Using repo example manged code; shows pre-cached boxes for common values can improve the performance of boxing (see below). Is there a way of building this into the jit or runtime?

Or is this just madness?

Integer Box Caching

                   Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------- |---------- |---------- |---------- |------- |--------------- |
      Int32UncachedBoxing | 7.4069 ns | 0.0498 ns | 7.3884 ns |   1.00 | 135,008,742.93 |
        Int32CachedBoxing | 5.3065 ns | 0.0495 ns | 5.3282 ns |   0.72 | 188,448,857.58 |
 Int32CachedBoxExtenstion | 6.7465 ns | 0.0880 ns | 6.7784 ns |   0.91 | 148,226,092.97 |

Boolean Box Caching

                  Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------ |---------- |---------- |---------- |------- |--------------- |
      BoolUncachedBoxing | 7.3923 ns | 0.0391 ns | 7.3866 ns |   1.00 | 135,276,250.40 |
        BoolCachedBoxing | 4.5859 ns | 0.0310 ns | 4.5954 ns |   0.62 | 218,057,656.08 |
 BoolCachedBoxExtenstion | 4.5874 ns | 0.0428 ns | 4.5988 ns |   0.62 | 217,986,777.33 |

Suggestion, cache boxes for:

bool: true, falue
byte: 0 to 255
char: 0 to 127
int/short: -128 to 127

Maybe others?

Gave it a go benaadams/coreclr@2f7726d, but not entirely sure what I'm doing, so you probably have a bunch of really weird Dr Watson reports...

Edit Updated with metrics post https://github.com/dotnet/coreclr/issues/8423#issuecomment-264500921

@mikedn
Copy link
Contributor

mikedn commented Dec 2, 2016

I would guess that this makes boxing slower. Who's benefiting from that? 👽s?

@jakobbotsch
Copy link
Member

Even if it was faster, it is a bad idea since boxed value types can have their values modified. Not expressible in C# (without interface tricks), but possible in IL and C++/CLI:

void Test(Object^ obj)
{
	Int32^ i = (Int32^)obj;
	*i = 30;
}

@jkotas
Copy link
Member

jkotas commented Dec 2, 2016

Right, I was about the write the same comment as @JanielS

ldc.i4 0
box bool // cached box for false
unbox bool // address of the bool in the cached box
ldc.i4 1
stobj bool // cached box for false is true now

@mikedn
Copy link
Contributor

mikedn commented Dec 2, 2016

Not expressible in C# (without interface tricks),

Or reflection:

object x = 1;
x.GetType().GetField("m_value", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(x, 42);
Console.WriteLine(x); // prints 42

@benaadams
Copy link
Member Author

benaadams commented Dec 2, 2016

Is changing the value of a boxed value directly that common?

To prevent subtle errors, could allocate boxes from a single page, then mark them as PAGE_READONLY (using VirtualProtect or mprotect) and you'd get an access violation if you tried... 😉

Obviously if its common practice then couldn't do that...

@mikedn
Copy link
Contributor

mikedn commented Dec 2, 2016

Is that common?

I'd say not. But is it worth the risk? What kind of code does a lot of boxing, uses a limited enough range of numeric values and expects good performance?

@jakobbotsch
Copy link
Member

jakobbotsch commented Dec 2, 2016

unbox is actually specced to return a controlled-mutability managed pointer in ECMA-335. Reading about those it means that the unboxed pointer is actually read-only for the types you outlined, so that makes this slightly more interesting...

I don't really think it matters all that much, but spec-wise, modifying these unboxed values seems to be disallowed.

@benaadams
Copy link
Member Author

benaadams commented Dec 2, 2016

What kind of code does a lot of boxing, uses a limited enough range of numeric values

String.Format https://github.com/dotnet/corefx/issues/1514
+Common usages of string interpolation
Passing values to SQL https://github.com/dotnet/corefx/issues/8955
Reading values from SQL SqlClient/SqlBuffer.cs
Structured logging aspnet/Logging#523
Reflection https://github.com/dotnet/corefx/issues/14021
Typed Parsing System/Json/JavaScriptReader.cs
Typed Json Data Newtonsoft.Json/JsonReader.cs

expects good performance

Well, that works both ways; introducing a cache would likely slow down boxing slightly; but ease up on GC. For reference I think the asm JIT_BoxFastMP_InlineGetThread is the fastest version of boxing?

Is common practice in Java; so

Integer i0 = 127;
Integer i1 = 127;
System.out.println(i0 == i1); // Prints true, reference equality

Integer i2 = 128;
Integer i3 = 128;
System.out.println(i2 == i3); // Prints false, different references

Which confuses people; but Integer has a different equality to int; which I think also confuses people. Whereas in C# for reference equality it would need to be a clearer object == object test; which makes more sense that it's reference equals.

@mikedn
Copy link
Contributor

mikedn commented Dec 2, 2016

String.Format

String.Format is far from having stellar performance anyway. And not because of boxing. If you try something like:

int x = 42;
for (int i = 0; i < 10000000; i++)
{
    String.Format("hello {0}", x);
}

and change the type of x from int to object you won't notice any difference.

but ease up on GC

Eh, the old story about GC. Let's cache everything because GC can't handle it. But GC handles short lived objects pretty well so it's not clear how much this will "ease up on GC".

Is common practice in Java; so

Considering that even with generics Java can't store integers in collections without boxing I'd say that it needs such caching more than .NET does. Besides, that Stackoverflow question is a good enough reason to not attempt to implement such a trick in .NET. Because this is a trick and a rather ugly one.

This is something that user code can do reasonably well when truly needed. For example WPF does this for bool and a couple of WPF specific enums (AFAIR Visibility is one those enums). It does this due to the way it stores property values - always in boxed form. That's a case where having just 2 instances of bool instead of a zillion does actually save something - memory, GC time etc.

@benaadams
Copy link
Member Author

benaadams commented Dec 2, 2016

String.Format is far from having stellar performance anyway.

Added a few more examples; longer lived would be typed documents in memory; Json, Csv, Xaml etc as your example of WPF

This is something that user code can do reasonably well when truly needed. For example WPF does this for bool and ...

And people create their own boxing caches again and again https://github.com/dotnet/corefx/issues/6533, LinqExpressions

@jkotas
Copy link
Member

jkotas commented Dec 2, 2016

Eh, the old story about GC. Let's cache everything because GC can't handle it.

Exactly. You do not actually care whether there are thousands GCs or no GC. What you care about is how fast a thing runs and what the GC pauses are. And caching short-lived objects does not improve either.

And people create their own boxing caches again and again

People who do not measure do... . The example from Expressions make sense because of these objects seem to be long-lived. It is a rare case. The CustomAttributeDecoder example makes less sense.

@benaadams
Copy link
Member Author

benaadams commented Dec 2, 2016

Maybe I should have led with a repo you can try locally and some metrics for it

Integer Box Caching

                   Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------- |---------- |---------- |---------- |------- |--------------- |
      Int32UncachedBoxing | 7.4069 ns | 0.0498 ns | 7.3884 ns |   1.00 | 135,008,742.93 |
        Int32CachedBoxing | 5.3065 ns | 0.0495 ns | 5.3282 ns |   0.72 | 188,448,857.58 |
 Int32CachedBoxExtenstion | 6.7465 ns | 0.0880 ns | 6.7784 ns |   0.91 | 148,226,092.97 |

Boolean Box Caching

                  Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------ |---------- |---------- |---------- |------- |--------------- |
      BoolUncachedBoxing | 7.3923 ns | 0.0391 ns | 7.3866 ns |   1.00 | 135,276,250.40 |
        BoolCachedBoxing | 4.5859 ns | 0.0310 ns | 4.5954 ns |   0.62 | 218,057,656.08 |
 BoolCachedBoxExtenstion | 4.5874 ns | 0.0428 ns | 4.5988 ns |   0.62 | 217,986,777.33 |

Updated summary

@jkotas
Copy link
Member

jkotas commented Dec 2, 2016

Microbenchmark is good, but it does not tell the full story. This would need to be looked at in the context of real workloads like Roslyn or ASP.NET.

@benaadams
Copy link
Member Author

benaadams commented Dec 3, 2016

Roslyn caches boxes for true, false, all zeros, int32 1, chars 0 - 127 https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/Collections/Boxes.cs

But I take your point

@mikedn
Copy link
Contributor

mikedn commented Dec 3, 2016

And people create their own boxing caches again and again dotnet/corefx#6533, LinqExpressions

That might mean that the framework should provide some kind of mechanism to help with this issue. But it doesn't necessarily mean that the mechanism should be built into the boxing operation itself. A Box method that can be called from user code as needed might be just enough. People who need this and are happy with what it offers will use it. Those who aren't happy with what it offers may still do their own thing.

@benaadams
Copy link
Member Author

benaadams commented Dec 3, 2016

That might mean that the framework should provide some kind of mechanism to help with this issue.

From my benchmark tests; what I was attempting to do was the wrong approach as it was a (virtual) call in the runtime; and about half of the gain is already lost if its not inlined.

So (if automatic) it would either need to be a code replace in jit or something like a Roslyn generator - depending how that plays out? dotnet/roslyn#5561

And... I'd guess the second would be preferred as it would be a user choice/reference install?

@benaadams benaadams reopened this Dec 3, 2016
@gafter
Copy link
Member

gafter commented Oct 24, 2019

Boxing is required in the C# language specification and in the CLR specification to allocate a new object instance. This is trivially observable in user code. So the proposed "optimization" would violate both specifications.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr design-discussion Ongoing discussion about design without consensus
Projects
None yet
Development

No branches or pull requests

6 participants