Let us talk about replace the LRU cache with cuckoo hash cache. #2264

xuguruogu · 2020-07-31T12:51:34Z

Use a new kind of cache instead of LRU. :

LRU is not efficient for small objects.
TTL is needed except cache coherence is guaranteed.
Similar to cuckoo hash, use 64K spinlock as concurrency control.
Reject to insert to the cache if full is enough for most cases.

The cuckoo hash cache is copied from the cuckoo hash map, removing rehash and cuckoo_run procedures, and act as a cache.

xuguruogu · 2020-08-01T07:43:59Z

With the help of cuckoo cache and cache non-exist tags, I achieved a larger performance improvement. I can not even count out the improvement factors.

Lock collision of LRU cache and cache miss(including non exist tag cache miss) rate are the main performance bottlenecks for now.

xuguruogu · 2020-08-01T12:13:33Z

With the cuckoo hash cache difference only, I achieved to reduce the costs mentioned in #2249 from 5min to less than 2min. Regarding the spark task management and preprocessing costs, I believe it can contribute more than 3X improvement.

dangleptr · 2020-08-03T04:21:14Z

Very good feature.
We'd better use an option to control which kind of cache used.
I will review it later. Please check the code style by the way. For example, the variable style is single camel.

dangleptr · 2020-08-03T05:34:33Z

The cuckoo hash cache is copied from the cuckoo hash map, removing rehash and cuckoo_run procedures, and act as a cache.

We should add some comments about it.

dangleptr · 2020-08-03T06:09:25Z

src/common/base/CuckooHashCache.h

+    // the bucket indices associated with the hash value and the current
+    // hashpower.
+    TwoBuckets snapshot_and_lock_two(const hash_value &hv) const {
+        while (true) {


why while(true) here?

dangleptr · 2020-08-03T06:49:31Z

src/common/base/CuckooHashCache.h

+        slot = -1;
+        for (int i = 0; i < static_cast<int>(slot_per_bucket()); ++i) {
+            if (try_reclaim(bucket_ind, i)) {
+                slot = i;


why not return true here?

I got your purpose.

dangleptr · 2020-08-03T06:56:20Z

src/common/base/CuckooHashCache.h

+    public:
+        bucket() noexcept : occupied_() {}
+
+        const value_type &kvpair(size_t ind) const {


what's the difference withconst storage_value_type &storage_kvpair(size_t ind) const

storage_value_type is a std::pair with key const. It should not be visited outside.

dangleptr · 2020-08-03T07:04:01Z

With the cuckoo hash cache difference only, I achieved to reduce the costs mentioned in #2249 from 5min to less than 2min. Regarding the spark task management and preprocessing costs, I believe it can contribute more than 3X improvement.

Ha, i think the performance comes from TTL feature, not just cuckoo hash cache.

xuguruogu · 2020-08-03T08:18:41Z

It acts like google sparse_hash_map, using continuous memory, go forward a limited space for insert and lookup. With no memory malloc if key/value is simple. With good spatial locality features.

dangleptr · 2020-08-03T08:37:53Z

It acts like google sparse_hash_map, using continuous memory, go forward a limited space for insert and lookup. With no memory malloc if key/value is simple. With good spatial locality features.

Yes, we should satisfy different purposes with different kind cache. Let's talk about it offline.

dangleptr · 2020-08-05T10:26:46Z

src/common/base/CuckooHashCache.h

+        slot = -1;
+        for (int i = 0; i < static_cast<int>(slot_per_bucket()); ++i) {
+            if (try_reclaim(bucket_ind, i)) {
+                slot = i;


I got your purpose.

dangleptr · 2020-08-05T10:31:00Z

src/common/base/CuckooHashCache.h

+            if (try_reclaim(bucket_ind, i)) {
+                slot = i;
+            } else if (b.occupied(i)) {
+                if (!is_simple() && partial != b.partial(i)) {


We could eliminate the if here. Because we could ensure is_simple at compile time.

dangleptr · 2020-08-05T10:59:41Z

src/common/base/CuckooHashCache.h

@@ -0,0 +1,875 @@
+#pragma once


#pragma once is not standard, please use include guard as other headers in our project.

bright-starry-sky

I see that all functions and variables are naming rule differently than the general . right?

bright-starry-sky · 2020-08-13T07:59:14Z

src/common/base/CuckooHashCache.h

+        std::array<bool, SLOT_PER_BUCKET> occupied_;
+    };
+
+    bucket_container(size_t hp) : hashpower_(hp), buckets_(size()) { }


Is it better to use explicit？

bright-starry-sky · 2020-08-13T08:22:47Z

src/common/base/CuckooHashCache.h

+class bucket_container {
+public:
+    using key_type = Key;
+    using mapped_type = T;


Could you explain the usage of T?

CLAassistant · 2020-09-02T08:57:39Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

monadbobo · 2020-09-24T02:02:56Z

src/common/base/CuckooHashCache.h

+                del_from_bucket(pos.index, pos.slot);
+            }
+            return true;
+        } else {


No need for an else here.

Sophie-Xie · 2021-12-21T09:23:26Z

This is an early PR based on v1, I will close it first. If it's necessary in the future, need to submit a new PR based on the master.

dangleptr requested review from dangleptr, bright-starry-sky, critical27, darionyaphet and panda-sheep August 3, 2020 06:00

dangleptr reviewed Aug 3, 2020

View reviewed changes

dangleptr mentioned this pull request Aug 5, 2020

Support TTL cache inside ConcurrentLRUCache #2272

Closed

dangleptr reviewed Aug 5, 2020

View reviewed changes

cuckoo hash cache.

0e46f3f

xuguruogu force-pushed the cuckoo-cache branch from 9216c0c to 0e46f3f Compare August 5, 2020 12:21

bright-starry-sky reviewed Aug 13, 2020

View reviewed changes

monadbobo reviewed Sep 24, 2020

View reviewed changes

src/common/base/CuckooHashCache.h

del_from_bucket(pos.index, pos.slot);

}

return true;

} else {

Copy link

Contributor

monadbobo Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an else here.

Sophie-Xie added the type/enhancement Type: make the code neat or more efficient label Sep 13, 2021

Sophie-Xie added the community Source: who proposed the issue label Sep 27, 2021

Sophie-Xie closed this Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let us talk about replace the LRU cache with cuckoo hash cache. #2264

Let us talk about replace the LRU cache with cuckoo hash cache. #2264

xuguruogu commented Jul 31, 2020 •

edited

Loading

xuguruogu commented Aug 1, 2020

xuguruogu commented Aug 1, 2020

dangleptr commented Aug 3, 2020 •

edited

Loading

dangleptr commented Aug 3, 2020

dangleptr Aug 3, 2020

dangleptr Aug 3, 2020

dangleptr Aug 5, 2020

dangleptr Aug 3, 2020

xuguruogu Aug 3, 2020

dangleptr commented Aug 3, 2020

xuguruogu commented Aug 3, 2020

dangleptr commented Aug 3, 2020

dangleptr Aug 5, 2020

dangleptr Aug 5, 2020

dangleptr Aug 5, 2020

bright-starry-sky left a comment

bright-starry-sky Aug 13, 2020

bright-starry-sky Aug 13, 2020

CLAassistant commented Sep 2, 2020 •

edited

Loading

monadbobo Sep 24, 2020

Sophie-Xie commented Dec 21, 2021

Let us talk about replace the LRU cache with cuckoo hash cache. #2264

Let us talk about replace the LRU cache with cuckoo hash cache. #2264

Conversation

xuguruogu commented Jul 31, 2020 • edited Loading

xuguruogu commented Aug 1, 2020

xuguruogu commented Aug 1, 2020

dangleptr commented Aug 3, 2020 • edited Loading

dangleptr commented Aug 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dangleptr commented Aug 3, 2020

xuguruogu commented Aug 3, 2020

dangleptr commented Aug 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bright-starry-sky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

Sophie-Xie commented Dec 21, 2021

xuguruogu commented Jul 31, 2020 •

edited

Loading

dangleptr commented Aug 3, 2020 •

edited

Loading

CLAassistant commented Sep 2, 2020 •

edited

Loading