-
Notifications
You must be signed in to change notification settings - Fork 4.9k
HashCode based on xxHash32 #25013
HashCode based on xxHash32 #25013
Changes from 1 commit
559d006
e9a5e0a
9690fa0
32b83d7
07879b4
4c45da5
20913b7
a9a6d7f
638b55b
4b0ca46
07aa81d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please follow the rules from the "Copying Files from Other Projects" section of https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/contributing.md. In particular, I think that means including the xxHash license here and adding it to the 3rd party notices file. @richlander, please confirm we don't need to do anything else to make this file follow the rules. |
||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using System.Collections.Generic; | ||
using System.ComponentModel; | ||
using System.Runtime; | ||
using System.Runtime.CompilerServices; | ||
|
||
namespace System | ||
{ | ||
public unsafe struct HashCode | ||
{ | ||
private const uint Queue = 4; | ||
private const uint Length = 7; | ||
|
||
private const uint Prime1 = 2654435761U; | ||
private const uint Prime2 = 2246822519U; | ||
private const uint Prime3 = 3266489917U; | ||
private const uint Prime4 = 0668265263U; | ||
private const uint Prime5 = 0374761393U; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Together, to me, these points say "please remove the leading 0". (My first thought (given the knowledge that it had compiled) was a quirky behavior that since an 8 and 9 were present that it knew it couldn't be octal, but apparently C# simply removed that confusing part of C, but without removing the "confusing" flag from my head.) |
||
|
||
// 8x8 = 64 bytes = 1 x86 cache line | ||
// +----+----+----+----+----+----+----+----+ | ||
// | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | | ||
// | xxHash State | Queue | Le | | ||
// +----+----+----+----+----+----+----+----+ | ||
private fixed uint _state[8]; | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
private void Add(int value) | ||
{ | ||
unchecked | ||
{ | ||
var val = (uint)value; | ||
|
||
fixed (uint* state = _state) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a convention we called such fixed pointers pXxx (pState in this example). It makes it easier to see in following code where there is a danger of overruns. |
||
{ | ||
var queue = state + Queue; | ||
var position = state[Length] & 0x3; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this is not really Length but rather LastIndex. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, it seems like this should be LengthIdx, as the array is used as a packed struct (which would be nicer). |
||
|
||
if (position < 3) | ||
{ | ||
queue[position] = val; | ||
} | ||
else | ||
{ | ||
|
||
var sentinel = | ||
state[0] | state[1] | state[2] | state[3] | // any of accumulators set? | ||
(state[Length] ^ position); // length greater than 3? | ||
|
||
if (sentinel == 0) | ||
{ | ||
// Initialize | ||
|
||
state[0] = Prime1 + Prime2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To avoid HashCode consumers to confuse the hash-code to be suitable for persistence, the initialization should make use of a random, per-app-domain, seed. For XX32, The seed is added to the first three state-vars ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem. I still have the seed code in my polyfill. This would break the test vector test (which the build server is having problems with anyway). I can't think of any other way to test the output of this function. Suggestions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At my prototype project Haschisch.Kastriert I implemented a version where the seed could be specified explicitly. The tests would use a known seed, and the overloads for the default seed would use random numbers (from the security.cryptography randomnumbergenerator). |
||
state[1] = Prime2; | ||
state[2] = 0; | ||
state[3] = (uint)-Prime1; | ||
} | ||
|
||
for (var i = 0; i < 3; i++) | ||
state[i] = Round(state[i], queue[i]); | ||
state[3] = Round(state[3], val); | ||
} | ||
|
||
state[Length]++; | ||
} | ||
} | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public int ToHashCode() | ||
{ | ||
unchecked | ||
{ | ||
fixed (uint* state = _state) | ||
{ | ||
var queue = state + Queue; | ||
var position = state[Length] & 0x3; | ||
|
||
// Mix the accumulators (if they contain anything) | ||
|
||
var hash = state[Length] > 3 | ||
? Rol(state[0], 1) + Rol(state[1], 7) + Rol(state[2], 12) + Rol(state[3], 18) | ||
: Prime5; | ||
|
||
hash += state[Length] * 4; | ||
|
||
// Mix what remains in the queue | ||
|
||
for (var i = 0; i < position; i++) | ||
{ | ||
hash += queue[i] * Prime3; | ||
hash = Rol(hash, 17) * Prime4; | ||
} | ||
|
||
// Final mix | ||
|
||
hash ^= hash >> 15; | ||
hash *= Prime2; | ||
hash ^= hash >> 13; | ||
hash *= Prime3; | ||
hash ^= hash >> 16; | ||
|
||
return (int)hash; | ||
} | ||
} | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
private uint Round(uint seed, uint input) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can be static. |
||
{ | ||
unchecked | ||
{ | ||
seed += input * Prime2; | ||
seed = Rol(seed, 13); | ||
seed *= Prime1; | ||
return seed; | ||
} | ||
} | ||
|
||
# pragma warning disable 0809 | ||
[Obsolete("Use ToHashCode to retrieve the computed hash code.", error: true)] | ||
[EditorBrowsable(EditorBrowsableState.Never)] | ||
public override int GetHashCode() => ToHashCode(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My (and also @tannergooding 's) interpretation of the proposed API was to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to do this, just need a final say on which Exception and what message. |
||
# pragma warning restore 0809 | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
private static uint Rol(uint value, int count) | ||
=> (value << count) | (value >> (32 - count)); | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public void Add<T>(T value) => Add(value?.GetHashCode() ?? 0); | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public void Add<T>(T value, IEqualityComparer<T> comparer) | ||
{ | ||
if (comparer is null) throw new ArgumentNullException(nameof(comparer)); | ||
Add(comparer.GetHashCode(value)); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public static int Combine<T1>(T1 value1) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've once tried combining hash-codes with generic code, but due to performance (factor 2 to 8) ended up manually unrolling all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll benchmark post the results in a comment below. |
||
return hc.ToHashCode(); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public static int Combine<T1, T2>(T1 value1, T2 value2) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
hc.Add(value2); | ||
return hc.ToHashCode(); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public static int Combine<T1, T2, T3>(T1 value1, T2 value2, T3 value3) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
hc.Add(value2); | ||
hc.Add(value3); | ||
return hc.ToHashCode(); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public static int Combine<T1, T2, T3, T4>(T1 value1, T2 value2, T3 value3, T4 value4) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
hc.Add(value2); | ||
hc.Add(value3); | ||
hc.Add(value4); | ||
return hc.ToHashCode(); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public static int Combine<T1, T2, T3, T4, T5>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
hc.Add(value2); | ||
hc.Add(value3); | ||
hc.Add(value4); | ||
hc.Add(value5); | ||
return hc.ToHashCode(); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public static int Combine<T1, T2, T3, T4, T5, T6>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
hc.Add(value2); | ||
hc.Add(value3); | ||
hc.Add(value4); | ||
hc.Add(value5); | ||
hc.Add(value6); | ||
return hc.ToHashCode(); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
public static int Combine<T1, T2, T3, T4, T5, T6, T7>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
hc.Add(value2); | ||
hc.Add(value3); | ||
hc.Add(value4); | ||
hc.Add(value5); | ||
hc.Add(value6); | ||
hc.Add(value7); | ||
return hc.ToHashCode(); | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[TargetedPatchingOptOut("Performance critical for inlining across NGen images.")] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jkotas is "targetedpatchingoptout" still needed in corfx? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if so when do we apply it... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No. These attributes can be deleted. |
||
public static int Combine<T1, T2, T3, T4, T5, T6, T7, T8>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7, T8 value8) | ||
{ | ||
var hc = new HashCode(); | ||
hc.Add(value1); | ||
hc.Add(value2); | ||
hc.Add(value3); | ||
hc.Add(value4); | ||
hc.Add(value5); | ||
hc.Add(value6); | ||
hc.Add(value7); | ||
hc.Add(value8); | ||
return hc.ToHashCode(); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using Microsoft.Xunit.Performance; | ||
|
||
namespace System.Tests | ||
{ | ||
public class Perf_HashCode | ||
{ | ||
[Benchmark] | ||
public void Add_() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why the trailing underscore? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Copy/paste (have a look at Perf.Int32). I thought it might be part of some convention. Removed. |
||
{ | ||
foreach (var iteration in Benchmark.Iterations) | ||
using (iteration.StartMeasurement()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we'd prefer braces for the foreach here. |
||
{ | ||
var hc = new HashCode(); | ||
for (int i = 0; i < 10000; i++) | ||
{ | ||
hc.Add(i); hc.Add(i); hc.Add(i); | ||
hc.Add(i); hc.Add(i); hc.Add(i); | ||
hc.Add(i); hc.Add(i); hc.Add(i); | ||
} | ||
hc.ToHashCode(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder whether the JIT might notice that nothing useful happens here and remove all of it. You might need to use the result in some way to ensure it doesn't get eliminated. |
||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be good to have benchmarks for a couple of the Combine methods since those are hopefully the ones that get used most and you've already done some performance tuning on them. I don't think you need every combine, but maybe Combine with 2, 4 and 8 inputs. |
||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have realized that this is no longer needed, I'll put it into the next commit that solves the build problem and any additional feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.