-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RyuJIT] Fix rel32 for calls/data #49549
Conversation
This does not sound right. We have done work in .NET Core 1.0 to make it work - by allocating codeheap close to libcoreclr.so. |
Here is a repro: using System.Runtime.CompilerServices;
class Program
{
static void Main() => Test();
static int s_Field;
[MethodImpl(MethodImplOptions.NoInlining)]
static int Test() => s_Field;
} here is the codegen for 48B8142EB71601000000 mov rax, 0x116B72E14
8B00 mov eax, dword ptr [rax]
C3 ret
; Total bytes of code 13 ;; (should be 7 bytes with rel32) From my understanding, this PR - dotnet/coreclr#12831 could fix it. |
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
This should be covered by |
That specific rerpro relies on VM's runtime/src/coreclr/utilcode/util.cpp Lines 446 to 449 in 6791d05
USE_UPPER_ADDRESS is always 0 on *nix (this PR wanted to enable it back https://github.com/dotnet/coreclr/pull/12831/files)and runtime/src/coreclr/vm/jitinterface.cpp Lines 11741 to 11747 in 6791d05
|
I guess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
turn fAllowRel32 off globally.
it sounds pretty drastically, do you know if Mono has such deoptimizations that affect all methods after the current? Do we now if we hit fAllowRel32 = false
when running benchmarks?
Just checked - Mono doesn't support it in JIT mode (emits zero-relative addresses). I guess disabling it globally makes sense for large apps so you won't have to recompile each method twice at some point.
Yeah we do, we have plenty of pinvokes+suppressgc in the corelib which are triggered during start up of any more or less non-hello world app. E.g. I noticed a tiny improvement for a simple benchmark where I just benchmark |
JIT supports relative addresses for calls/data (see jump-stubs.md), it allows it to produce smaller codegen in some cases. Consider the following example:
(ignore that pinvoke for now)
Let's take a look at the
GetField
codegen:Cool, it uses (reloc)! Now, let's uncomment that "unrelated"
GetLogicalDrives
call, which should not affectGetField
codegen anyhow and run the app again (and ask forGetField
codegen):Boom!
rel32
is disabled globally for the whole app and won't be emitted anywhere (so e.g. all static fields, all PGO counters won't benefit from it).Why it happens
For pinvokes + SuppressGCTransition we emit direct calls to those pinvokes (since we don't have to switch coop to preemptive and back) with the absolute address of the external target (in our case -
kernel32
'sGetLogicalDrives
) here.Then here emitter tries to use rel32 for this call and asks VM to validate and record the relocation here. So we hit this code-path and basically ask VM to recompile the whole method and disable
fAllowRel32
globally for everybody (see here). My fix just setsgtControExpr
for such calls so the emitter will just emit:instead of trying to use rel32 (jump-stubs) and fail.
Jit-diff (jit-utils --pmi -f)
Quite a huge diff, but from my understanding, it happens because of recompilation, e.g. here is the diff result for
DateTime.UtcNow
:base: https://gist.github.com/EgorBo/106aede86c783f48e5966a0c81f11884
diff: https://gist.github.com/EgorBo/e37bc359c6c654a0bea27858e1724576
As you can see, in the
base
it's presented twice (the first one got rejected by VM).Here is the diff between the second version of
base
and thediff
: https://www.diffchecker.com/UgPKqFpnas you can see gc-polls use reloc only in the "diff" (this PR) version. So, the actual codegen diff is
-16
bytes rather than-530
as of reported.In the real-word such recompilations happen only once (once we hit any pinvoke+suppressgc anywhere, BCL has plenty of them, e.g. in the IO subsystem - I guess we always hit one pretty early) and then
fAllowRel32
is disabled for future compilation. But jit-diff compiles each method separately so the diff is full of failed duplicates./cc @dotnet/jit-contrib @jkotas
PS: we don't support rel32 for Linux/macOS x64 at least for static fields, but it'd be nice to have, especially for PGO where each counter would benefit from it (
inc dword ptr [reloc]
) #8545.Fixes #49476