-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Fast-Mod algorithm for native EEHashTableBase #65778
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
cc @VSadov |
I think this can use the same or very similar hashtable as casting https://github.com/dotnet/runtime/blob/main/src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs |
If the number of buckets is a power of 2 then mod is just a masking operation. (Though depending on the distribution of |
I applied the FastMod locally and got 22% improvement for the benchmark - but I am not sure I do it in a race-condition free fashion. |
Using FastMod could be an easy way to speed up EEHashTableBase. A few thoughts:
|
Prototype: EgorBo@c99fe02
public class P
{
[Benchmark]
public void CallVGM()
{
VGM<string>();
}
virtual void VGM<T>()
{
}
static void Main(string[] args) =>
BenchmarkSwitcher.FromAssembly(typeof(P).Assembly).Run(args);
}
I agree but that sounds like a more invasive change, while FastMod could also improve other EEHashTable users? |
When I profile this simple virtual generic callsite:
in VTune I see
div
as the most expensive part (if I read it correctly):It turns out to be a EEHashTable lookup:
And It does make sense since div is a very expensive operation, on some CPU its latency can be bigger than 100 cycles. On my Coffee Lake it's 26/6 while e.g. mul is 3/1 (Latency/rec.trput)
The managed Dictionary was optimized with "Fast Mod" algorithm where we pre-calculate a special multiplier every time a new bucket is added and then can avoid expensive idiv operation by using mul see dotnet/coreclr#27299 and #406
Quick demo:
NOTE: the algorithm adds an overhead on dictionary expansion
NOTE2:
CompareKeys
is not inlinedThe text was updated successfully, but these errors were encountered: