Replace std::hash with our custom hash function #2952

benjaminwinger · 2024-02-26T19:54:50Z

std::hash is implementation dependent and not guaranteed to be the same between different stdlibs, which means our current hash index layout is not always portable.

std::hash also apparently is not required to produce the same result for the same input in different runs of the same program, and while there hasn't been any evidence of that yet, it could cause issues in future.

From https://en.cppreference.com/w/cpp/utility/hash:

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision denial-of-service attacks.

Fixes #2943.

I did a quick benchmark on a long string (first five paragraphs of lorem ipsum), and this took 123ns/iteration on average, compared to 117ns/iteration with std::hash, which is fairly reasonable.

codecov · 2024-02-26T20:12:36Z

Codecov Report

Attention: Patch coverage is 84.21053% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 93.51%. Comparing base (d2cfc65) to head (f127718).

Files	Patch %	Lines
src/include/function/hash/hash_functions.h	84.61%	2 Missing ⚠️
src/include/storage/index/hash_index_builder.h	66.66%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2952   +/-   ##
=======================================
  Coverage   93.51%   93.51%           
=======================================
  Files        1121     1121           
  Lines       42913    42919    +6     
=======================================
+ Hits        40131    40137    +6     
  Misses       2782     2782

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ray6080

LGTM

ray6080 · 2024-02-27T02:19:26Z

src/include/function/hash/hash_functions.h

@@ -133,31 +134,52 @@ inline void Hash::operation(
 template<>
 inline void Hash::operation(
    const double& key, common::hash_t& result, common::ValueVector* /*keyVector*/) {
-    result = std::hash<double>()(key);
+    // 0 and -0 are not byte-equivalent, but should have the same hash
+    if (key == 0) {


I wonder if adding 0.0 can also solve the "0 and -0 are not byte-equivalent" problem without an if/else branch.

auto newKey = key + 0.0; result = murmurhash64(*reinterpret_cast<const uint64_t*>(&newKey));

That does appear to also work.

ray6080 · 2024-02-27T06:05:48Z

@benjaminwinger I merged this, so it can be in our dev build to be verified by the bug reporter.

benjaminwinger mentioned this pull request Feb 26, 2024

Hash function improvements #2290

Open

Replace std::hash with our custom hash function

f127718

benjaminwinger force-pushed the hash-fix branch from 14b5d32 to f127718 Compare February 26, 2024 20:32

ray6080 approved these changes Feb 27, 2024

View reviewed changes

ray6080 merged commit 87b4348 into master Feb 27, 2024
15 checks passed

ray6080 deleted the hash-fix branch February 27, 2024 06:04

mewim mentioned this pull request Feb 27, 2024

[DO NOT MERGE] 0.3.1 base branch #2961

Closed

mewim mentioned this pull request Apr 1, 2024

Inconsistent behaviour between CLI and Explorer when specifying equality predicate on WHERE clause kuzudb/explorer#129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace std::hash with our custom hash function #2952

Replace std::hash with our custom hash function #2952

benjaminwinger commented Feb 26, 2024 •

edited

Loading

codecov bot commented Feb 26, 2024 •

edited

Loading

ray6080 left a comment

ray6080 Feb 27, 2024

benjaminwinger Feb 27, 2024

ray6080 commented Feb 27, 2024 •

edited

Loading

Replace std::hash with our custom hash function #2952

Replace std::hash with our custom hash function #2952

Conversation

benjaminwinger commented Feb 26, 2024 • edited Loading

codecov bot commented Feb 26, 2024 • edited Loading

Codecov Report

ray6080 left a comment

Choose a reason for hiding this comment

ray6080 Feb 27, 2024

Choose a reason for hiding this comment

benjaminwinger Feb 27, 2024

Choose a reason for hiding this comment

ray6080 commented Feb 27, 2024 • edited Loading

benjaminwinger commented Feb 26, 2024 •

edited

Loading

codecov bot commented Feb 26, 2024 •

edited

Loading

ray6080 commented Feb 27, 2024 •

edited

Loading