-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: more general value class devirtualization #52210
JIT: more general value class devirtualization #52210
Conversation
When devirtualization knows that the `this` object is a boxed value class, it now attempts to update the call to invoke the unboxed entry (when there is one). This extends an existing optimization that worked on calls where the box was local and only fed the call. We now handle calls that dispatch on boxed value classes more generally, even if their creation is not local or is local but in a form the existng opt could not handle. These new cases either come from failed local box removal opts (say multi-use boxes) or from guarded devirtualization. The "boxed" entry for value class methods is an un-inlineable VM stub. This transformation effectively inlinines the stub and unblocks inlining of the underlying method.
@EgorBo PTAL The back-and-forth between devirtualization, unboxing opts, guarded devirtualization, and inlining is getting a bit convoluted. I will look into streamlining this but don't have any great ideas just yet. Diffs for this change are tricky to assess, since it changes the jit interface traffic SPMI is not very useful, and since it's often enabled by PGO, PMI is also not very useful. There are some PMI diffs, mainly cases where the jit now invokes the unboxed entry point; generally this causes small code size increases as we're basically inlining the unboxing wrapper.
I also did a special SPMI collection over asp.net with this change; replaying that vs the baseline jit showed no diffs. One simple example where this kicks in is the following method: public static int Sum(IEnumerable<int> data)
{
int r = 0;
foreach (int x in data)
{
r += x;
}
return r;
} when called with |
Looks like something about the back and forth between boxed and unboxed entries is confusing crossgen2. Debugging. |
The issue is that we invoke
That succeeds, and we subsequently do the guarded devirt transform. During that we call back into
So the second time around we pass in the derived method, not the base method; evidently crossgen2 doesn't handle that, and fails to devirtualize, and so we assert because we didn't expect it to fail. Seems simple enough to also keep track of the base method here, and pass that back instead of the derived method. |
Minimal repro for the CI failure (JIT, TC=0): using System;
using System.Collections.Generic;
class Program
{
public static void GetEnumerator_TypeProperties<T>()
{
var arraySegment = new ArraySegment<T>(new T[1], 0, 1);
var ienumerableoft = (IEnumerable<T>)arraySegment;
Console.WriteLine(ienumerableoft.GetEnumerator().GetType());
}
static void Main(string[] args)
{
GetEnumerator_TypeProperties<string>();
Console.WriteLine("done");
}
} it prints:
instead of:
|
Thanks @EgorBo. The issue is that we can't pass a "shared" MT as the generic context. Instead we'd have to pass the actual runtime method table (which is doable, but more complex). I'll just defer on that case for now. The other surprise here is that devirtualization is enabled at Tier0. I'd been intending to experiment with this anyways as a way of cutting down on the volume of class profile data, but I guess we're already doing that. |
To handle the more general case we'd need to spill the
to
we might be able to do this with a comma form temp, but we'd need to be careful to make sure the comma was evaluated in the right order. |
Still one last test failure to track down, only happens on arm32. |
Still debugging the failure, but CSE is doing something quite odd:
I suppose it works out (?) but it seems completely unnecessary:
|
The issue is that for explicit tail calls we have a side data structure ( It also tracks the kind of call being made, so it really should be updated generally when we devirtualize. |
Above is the right idea but the ordering of things doesn't work out. In the importer we
The "updates" done by devirtualization don't flow back to the caller, so the subsequent tail call setup still uses stale info. Simplest thing at this point is just to forgo updating call target for explicit tail calls... |
Installer build failure looks like a CI hiccup / timing issue. Will retry.
|
@EgorBo all issues should be resolved, please take a look. |
Looks like for some reason the publish step for the linux x64 release didn't publish anything. So rerunning the consuming task is not going to resolve anything. And there doesn't seem to be a way to rerun a task that claims to have passed. So am going to ignore this failure. |
I like the fact it improves codegen even with TC=0, e.g.: public static int Sum()
{
IFoo o = new Foo {a = 42};
Console.WriteLine(o);
return o.GetV();
}
public interface IFoo
{
int GetV();
}
public struct Foo : IFoo
{
public int a;
public int GetV() => 42;
} Codegen diff: https://www.diffchecker.com/pMwW2QFI (GetV is inlined) |
I played with your branch locally and it looks good as far as I can say, I'm going to take a closer look at impDevirtCall later when I'll be working on #50915 (comment) where basically we do nothing for:
|
When devirtualization knows that the
this
object is a boxed value class,it now attempts to update the call to invoke the unboxed entry (when there
is one).
This extends an existing optimization that worked on calls where the box was
local and only fed the call. We now handle calls that dispatch on boxed value
classes more generally, even if their creation is not local or is local but
in a form the existing opt could not handle.
These new cases either come from failed local box removal opts (say multi-use
boxes) or from guarded devirtualization.
The "boxed" entry for value class methods is an un-inlineable VM stub. This
transformation effectively inlinines the stub and unblocks inlining of the
underlying method.