Use Fast-Mod algorithm for native EEHashTableBase #65778

EgorBo · 2022-02-23T15:21:25Z

When I profile this simple virtual generic callsite:

public class Prog
{
    static void InvokeVGM(Prog p)
    {
        while (true)
            p.VGM<string>();
    }

    virtual void VGM<T>() {}
}

in VTune I see div as the most expensive part (if I read it correctly):

It turns out to be a EEHashTable lookup:

And It does make sense since div is a very expensive operation, on some CPU its latency can be bigger than 100 cycles. On my Coffee Lake it's 26/6 while e.g. mul is 3/1 (Latency/rec.trput)

The managed Dictionary was optimized with "Fast Mod" algorithm where we pre-calculate a special multiplier every time a new bucket is added and then can avoid expensive idiv operation by using mul see dotnet/coreclr#27299 and #406

Quick demo:

public static uint Test(uint x)
{
    uint result = 0;

    ulong cachedMul = GetFastModMultiplier(y);
    for (int i = 0; i < x; i++)
    {
        // result = i % x;
        result = FastMod(i, x, cachedMul);
    }
    return result;
}

public static ulong GetFastModMultiplier(uint divisor) =>
    ulong.MaxValue / divisor + 1;

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static uint FastMod(uint value, uint divisor, ulong multiplier) => 
    (uint)(((((multiplier * value) >> 32) + 1) * divisor) >> 32);

NOTE: the algorithm adds an overhead on dictionary expansion
NOTE2: CompareKeys is not inlined

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2022-02-23T15:21:30Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

EgorBo · 2022-02-23T16:04:02Z

cc @VSadov

jkotas · 2022-02-23T16:43:13Z

I think this can use the same or very similar hashtable as casting https://github.com/dotnet/runtime/blob/main/src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs

bartonjs · 2022-02-23T16:45:11Z

If the number of buckets is a power of 2 then mod is just a masking operation. (Though depending on the distribution of dwHash values it may create overweighted buckets)

EgorBo · 2022-02-23T16:54:59Z

I applied the FastMod locally and got 22% improvement for the benchmark - but I am not sure I do it in a race-condition free fashion.

VSadov · 2022-02-23T18:51:36Z

Using FastMod could be an easy way to speed up EEHashTableBase. A few thoughts:

FastMod has two components - divisor and multiplier. They must match thus needing a bit more care regarding races. Storing multiplier in the table, the same way we store length, could be sufficient.
Are we sure the real problem is not collisions?
Sometimes excessive hashing expenses is what we see, but the real problem is collisions.
Whether EEHashTableBase is a good choice for caching is another good question.
If scenario that we have here is similar to the type casting cache (i.e. the values can be relatively cheaply recomputed), then a solution similar to cast cache could fit better.

EgorBo · 2022-02-23T19:11:56Z

Prototype: EgorBo@c99fe02

~~the only problem that it assumes NumBuckets fits into 32bit (pretty normal) and dwHash~~ - they always do, DWORD is 32bit

public class P
{
    [Benchmark]
    public void CallVGM()
    {
        VGM<string>();
    }

    virtual void VGM<T>()
    {
    }

    static void Main(string[] args) => 
        BenchmarkSwitcher.FromAssembly(typeof(P).Assembly).Run(args);
}

Method	Toolchain	Mean	Error	StdDev	Ratio
CallVGM	\Core_Root\corerun.exe	5.267 ns	0.1216 ns	0.1301 ns	1.00
CallVGM	\Core_Root_PR\corerun.exe	4.452 ns	0.0342 ns	0.0319 ns	0.85

then a solution similar to cast cache could fit better.

I agree but that sounds like a more invasive change, while FastMod could also improve other EEHashTable users?

EgorBo added the tenet-performance Performance related issue label Feb 23, 2022

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 23, 2022

EgorBo added the area-VM-coreclr label Feb 23, 2022

EgorBo mentioned this issue Feb 27, 2022

FastMod for EEHashTable (Faster virtual generics) #65926

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 27, 2022

mangod9 removed the untriaged New issue has not been triaged by the area owner label Feb 28, 2022

mangod9 added this to the 7.0.0 milestone Feb 28, 2022

EgorBo closed this as completed in #65926 Mar 1, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 1, 2022

ghost locked as resolved and limited conversation to collaborators Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Fast-Mod algorithm for native EEHashTableBase #65778

Use Fast-Mod algorithm for native EEHashTableBase #65778

EgorBo commented Feb 23, 2022 •

edited

Loading

dotnet-issue-labeler bot commented Feb 23, 2022

EgorBo commented Feb 23, 2022

jkotas commented Feb 23, 2022

bartonjs commented Feb 23, 2022

EgorBo commented Feb 23, 2022 •

edited

Loading

VSadov commented Feb 23, 2022

EgorBo commented Feb 23, 2022 •

edited

Loading

Use Fast-Mod algorithm for native EEHashTableBase #65778

Use Fast-Mod algorithm for native EEHashTableBase #65778

Comments

EgorBo commented Feb 23, 2022 • edited Loading

dotnet-issue-labeler bot commented Feb 23, 2022

EgorBo commented Feb 23, 2022

jkotas commented Feb 23, 2022

bartonjs commented Feb 23, 2022

EgorBo commented Feb 23, 2022 • edited Loading

VSadov commented Feb 23, 2022

EgorBo commented Feb 23, 2022 • edited Loading

EgorBo commented Feb 23, 2022 •

edited

Loading

EgorBo commented Feb 23, 2022 •

edited

Loading

EgorBo commented Feb 23, 2022 •

edited

Loading