-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let us talk about replace the LRU cache with cuckoo hash cache. #2264
Conversation
With the help of cuckoo cache and cache non-exist tags, I achieved a larger performance improvement. I can not even count out the improvement factors. Lock collision of LRU cache and cache miss(including non exist tag cache miss) rate are the main performance bottlenecks for now. |
With the cuckoo hash cache difference only, I achieved to reduce the costs mentioned in #2249 from 5min to less than 2min. Regarding the spark task management and preprocessing costs, I believe it can contribute more than 3X improvement. |
Very good feature. |
We should add some comments about it. |
src/common/base/CuckooHashCache.h
Outdated
// the bucket indices associated with the hash value and the current | ||
// hashpower. | ||
TwoBuckets snapshot_and_lock_two(const hash_value &hv) const { | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why while(true) here?
slot = -1; | ||
for (int i = 0; i < static_cast<int>(slot_per_bucket()); ++i) { | ||
if (try_reclaim(bucket_ind, i)) { | ||
slot = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not return true here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your purpose.
public: | ||
bucket() noexcept : occupied_() {} | ||
|
||
const value_type &kvpair(size_t ind) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference withconst storage_value_type &storage_kvpair(size_t ind) const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage_value_type is a std::pair with key const. It should not be visited outside.
Ha, i think the performance comes from TTL feature, not just cuckoo hash cache. |
It acts like google sparse_hash_map, using continuous memory, go forward a limited space for insert and lookup. With no memory malloc if key/value is simple. With good spatial locality features. |
Yes, we should satisfy different purposes with different kind cache. Let's talk about it offline. |
slot = -1; | ||
for (int i = 0; i < static_cast<int>(slot_per_bucket()); ++i) { | ||
if (try_reclaim(bucket_ind, i)) { | ||
slot = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your purpose.
if (try_reclaim(bucket_ind, i)) { | ||
slot = i; | ||
} else if (b.occupied(i)) { | ||
if (!is_simple() && partial != b.partial(i)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could eliminate the if
here. Because we could ensure is_simple
at compile time.
src/common/base/CuckooHashCache.h
Outdated
@@ -0,0 +1,875 @@ | |||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#pragma once
is not standard, please use include guard as other headers in our project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that all functions and variables are naming rule differently than the general . right?
std::array<bool, SLOT_PER_BUCKET> occupied_; | ||
}; | ||
|
||
bucket_container(size_t hp) : hashpower_(hp), buckets_(size()) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to use explicit?
class bucket_container { | ||
public: | ||
using key_type = Key; | ||
using mapped_type = T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the usage of T?
|
del_from_bucket(pos.index, pos.slot); | ||
} | ||
return true; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an else here.
This is an early PR based on v1, I will close it first. If it's necessary in the future, need to submit a new PR based on the master. |
Use a new kind of cache instead of LRU. :
The cuckoo hash cache is copied from the cuckoo hash map, removing rehash and cuckoo_run procedures, and act as a cache.