-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RuntimeHelpers.GetHashCode() happy path in C# #55273
Conversation
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
if (o is not null) | ||
{ | ||
ref IntPtr startOfDataRef = ref Unsafe.As<byte, IntPtr>(ref Unsafe.As<RawData>(o).Data); | ||
ref IntPtr objectHeaderRef = ref Unsafe.Add(ref startOfDataRef, -2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a GC hole. This byref will points to previous object. so it won't move together with the object that you are computing the hashcode for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch 😅
We were just about wondering whether the GC could track refs pointing to the object header, I wasn't completely sure but figured it might work given that the data was still part of the same object - guess I know now ahah
I've tried using fixed
there to fix that but as expected that's pretty slow and loses virtually all performance improvements than the current solution, so not really worth it anymore. Will close the PR for now then.
While on the topic - @SingleAccretion found a GT_START_NONGC
node in the emitter, and together with @EgorBo we were wondering whether it might make sense and/or be doable at all to introduce a new JIT intrinsic to be able to leverage that? Might make things like this possible without having to pin stuff and lose all performance gains? At the very least it sounds like a good learning opportunity so I thought I'd ask! Thanks! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think intrinsics for GT_START_NONGC
make sense. It is so subtle and hard to get these things right. I have no problems with giving up the bit of performance that we would get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, yeah that makes perfect sense and it'd also be extremely niche anyway. Thanks! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried using fixed there to fix that but as expected that's pretty slow
Can the JIT do more optimizations around fixed
to make it faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codegen for "fixed" not much to optimize:
The spill to stack can be optimized. Nothing fundamental says that the pinned slot has to be on stack. The pinned value can be in register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codegen for "fixed" not much to optimize:
The tailcall optimization can be done as well, at least in theory. I guess it may be hard to do today since we do not know whether there is anything that matters pinned when we are deciding whether to tailcall. Maybe it can be helped by moving the pinning into a separate (inlineable) method to make it easier for the JIT to see that there is nothing actually pinned to block the tailcall optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a similar issue for RuntimeHelpers.GetMethodTable
. RuntimeHelpers.GetMethodTable
cuts corners and I believe it does not compile into as efficient code as possible. I think it would be ok to introduce static T ReadAtByteOffset<T>(object o, int offset)
intrinsic that would read T
at given offset, without materializing o + offset
as byref, as efficiently as possible. We can then use that intrinsic for both syncblock reading and GetMethodTable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spill to stack can be optimized. Nothing fundamental says that the pinned slot has to be on stack. The pinned value can be in register.
Ah, I didn't realize 👍
The tailcall optimization can be done as well
Yep, currently it's rejected with
Rejecting tail call in morph for call [000042]: Has Pinned Vars V01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: @Sergio0694 can't comment here since the thread is locked, but judging by Discord #lowlevel channel he is excited where it goes 🙂
Overview
This PR adds an implementation of
RuntimeHelpers.GetHashCode(object)
in C# to enable inlining and skip the FCall overhead. This only applies to the happy path of that API, ie. when the hashcode is already available in the object header. If the required flags are not set, the updatedGetHashCode
method will just fallback to the usual implementation.Benchmarks
Benchmark code (click to expand):