-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hashes: implement murmur3 #12022
hashes: implement murmur3 #12022
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I assume the aim here is to offer a JS/C cross-compatible hashing function?
It would be nice to create a test that verifies that the hashes match on JS and C backends.
AFAIK it's also to fix perf regression #11581 after #11203. Can you rerun the same benchmarks as those in #11203 (comment). I think we also need either a tables benchmark or a randomness analysis to make sure we don't reintroduce bucket imbalance in hash tables like with #11203 |
Here are the results of the benchmark proposed in #11581 (direct link to the benchmark), as it tests more things. Here are presented three values: time to run the benchmark, entropy value, maximal number of collisions. Four different hashing algorithms were tested:
|
lib/pure/hashes.nim
Outdated
proc rotl32(x: uint32, r: int): uint32 {.inline.} = | ||
(x shl r) or (x shr (32 - r)) | ||
|
||
proc murmurHash[T: char|int8|byte](x: openArray[T]): Hash = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it easy to make this non-generic? Only offer the char
or byte
variant. Should remove some code bloat.
@narimiran here are my conclusions:
I've made a more comprehensive comparison here to compare the 2 contenders: # requires building nim on top of this PR #12022
# murmur64_128
nim c -r -d:danger -d:case_with_murmur $vitanim_D/testcases/tests/t0695.nim
# murmur32
nim c -r -d:danger $vitanim_D/testcases/tests/t0695.nim it measures performance for different aspects:
here are the results:
Conclusion: compared to
Also note that Based on this, this PR is a strict improvement over current situation so it's good enough for now in terms of performance; if needed, a future PR can achieve "best of both worlds" as follows:
|
...while taking into an account that the solution must be available for:
|
depending on whether we want I just checked, after your PR, https://github.com/nim-lang/Nim/issues/11989 is still broken, ie
one nasty detail is
relevant issues:
possible solution for
|
Just a note - currently overall "best" known tests are to be found in https://github.com/rurban/smhasher (and even better with rurban/smhasher#62 ). As can be seen, murmur3 is moderately slow though very portable. Feel free to reconsider the choice. |
@dumblob => see reply #11581 (comment) where you posted the same question. |
What is the use-case for having the same hashes in the VM, in C and in JS? Is that really necessary? We can just say in the hashes module that the "actual hash is implementation dependent and may differ depending on the execution backend (compile-time, C, C++, Javascript).". |
lib/pure/hashes.nim
Outdated
murmurHash(toOpenArray(aBuf, sPos, ePos)) | ||
when A is byte: | ||
when nimvm: | ||
result = hashVmImplByte(aBuf, 0, aBuf.high) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashVmImplByte(aBuf, sPos, ePos)
?
ditto below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. :-)
@@ -371,8 +388,16 @@ proc hash*[A](aBuf: openArray[A], sPos, ePos: int): Hash = | |||
let a = [1, 2, 5, 1, 2, 6] | |||
doAssert hash(a, 0, 1) == hash(a, 3, 4) | |||
|
|||
when A is char|int8|byte: | |||
murmurHash(toOpenArray(aBuf, sPos, ePos)) | |||
when A is byte: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- what about
int8/uint8
? at leastint8
was handled before this change IIRC - ditto above
maybe:when sizeof(A)==1 and A isnot char: ...
vm supports casting integers of same size so everything could be cast to 1 type (eg byte) without having to add overloads. Ideally (but out of scope for this PR) there are more things that VM should allow to cast safely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't consider it important enough but fair enough.
prevent gotchas like this: import hashes
import tables
proc hash*(x: int): Hash {.inline.} =
when nimvm: result = x*2
else: result = x
proc fun2[T](t: T) =
for k,v in t: echo (k,v)
proc fun(): auto =
var t = {1:2}.toTable
for i in 0..<3: t[i] = 3*i
return t
let t1 = fun()
const t2 = fun()
echo t1 == t2 # false
echo "VM"
fun2(t1)
echo "RT"
fun2(t2)
likewise with I think we should just guarantee that |
fixes #11581