Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RuntimeHelpers.GetHashCode() happy path in C# #55273

Closed

Conversation

Sergio0694
Copy link
Contributor

Overview

This PR adds an implementation of RuntimeHelpers.GetHashCode(object) in C# to enable inlining and skip the FCall overhead. This only applies to the happy path of that API, ie. when the hashcode is already available in the object header. If the required flags are not set, the updated GetHashCode method will just fallback to the usual implementation.

Benchmarks

Method Job Toolchain Mean Error StdDev Ratio RatioSD
RuntimeHelpersGetHashCode Job-IYQXHL MAIN 0.6922 ns 0.0059 ns 0.0050 ns 1.00 0.00
RuntimeHelpersGetHashCode Job-QUNTER PR 0.2241 ns 0.0084 ns 0.0070 ns 0.32 0.01
ObjectHashCode Job-IYQXHL MAIN 1.1849 ns 0.0148 ns 0.0138 ns 1.00 0.00
ObjectHashCode Job-QUNTER PR 0.4839 ns 0.0089 ns 0.0079 ns 0.41 0.01
TypeGetHashCode Job-IYQXHL MAIN 1.1943 ns 0.0129 ns 0.0114 ns 1.00 0.00
TypeGetHashCode Job-QUNTER PR 0.4864 ns 0.0135 ns 0.0126 ns 0.41 0.01
TypeOfTGetHashCode Job-IYQXHL MAIN 1.6589 ns 0.0198 ns 0.0185 ns 1.00 0.00
TypeOfTGetHashCode Job-QUNTER PR 1.3573 ns 0.0438 ns 0.0410 ns 0.82 0.03
Benchmark code (click to expand):
using System;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromTypes(new Type[] { typeof(HashCodeBenchmark) }).Run(args);

public class HashCodeBenchmark
{
    private readonly object dummy = new();
    private readonly Type type = typeof(string);

    [Benchmark]
    public int RuntimeHelpersGetHashCode()
    {
        return RuntimeHelpers.GetHashCode(dummy);
    }

    [Benchmark]
    public int ObjectHashCode()
    {
        return dummy.GetHashCode();
    }

    [Benchmark]
    public int TypeGetHashCode()
    {
        return type.GetHashCode();
    }

    [Benchmark]
    public int TypeOfTGetHashCode()
    {
        return typeof(int).GetHashCode();
    }
}

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

if (o is not null)
{
ref IntPtr startOfDataRef = ref Unsafe.As<byte, IntPtr>(ref Unsafe.As<RawData>(o).Data);
ref IntPtr objectHeaderRef = ref Unsafe.Add(ref startOfDataRef, -2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a GC hole. This byref will points to previous object. so it won't move together with the object that you are computing the hashcode for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch 😅
We were just about wondering whether the GC could track refs pointing to the object header, I wasn't completely sure but figured it might work given that the data was still part of the same object - guess I know now ahah
I've tried using fixed there to fix that but as expected that's pretty slow and loses virtually all performance improvements than the current solution, so not really worth it anymore. Will close the PR for now then.

While on the topic - @SingleAccretion found a GT_START_NONGC node in the emitter, and together with @EgorBo we were wondering whether it might make sense and/or be doable at all to introduce a new JIT intrinsic to be able to leverage that? Might make things like this possible without having to pin stuff and lose all performance gains? At the very least it sounds like a good learning opportunity so I thought I'd ask! Thanks! 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think intrinsics for GT_START_NONGC make sense. It is so subtle and hard to get these things right. I have no problems with giving up the bit of performance that we would get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah that makes perfect sense and it'd also be extremely niche anyway. Thanks! 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried using fixed there to fix that but as expected that's pretty slow

Can the JIT do more optimizations around fixed to make it faster?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen for "fixed" not much to optimize:

The spill to stack can be optimized. Nothing fundamental says that the pinned slot has to be on stack. The pinned value can be in register.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen for "fixed" not much to optimize:

The tailcall optimization can be done as well, at least in theory. I guess it may be hard to do today since we do not know whether there is anything that matters pinned when we are deciding whether to tailcall. Maybe it can be helped by moving the pinning into a separate (inlineable) method to make it easier for the JIT to see that there is nothing actually pinned to block the tailcall optimization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar issue for RuntimeHelpers.GetMethodTable. RuntimeHelpers.GetMethodTable cuts corners and I believe it does not compile into as efficient code as possible. I think it would be ok to introduce static T ReadAtByteOffset<T>(object o, int offset) intrinsic that would read T at given offset, without materializing o + offset as byref, as efficiently as possible. We can then use that intrinsic for both syncblock reading and GetMethodTable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spill to stack can be optimized. Nothing fundamental says that the pinned slot has to be on stack. The pinned value can be in register.

Ah, I didn't realize 👍

The tailcall optimization can be done as well

Yep, currently it's rejected with

Rejecting tail call in morph for call [000042]: Has Pinned Vars V01

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: @Sergio0694 can't comment here since the thread is locked, but judging by Discord #lowlevel channel he is excited where it goes 🙂

@Sergio0694 Sergio0694 closed this Jul 7, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants