Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let us talk about replace the LRU cache with cuckoo hash cache. #2264

Closed
wants to merge 1 commit into from

Conversation

xuguruogu
Copy link
Collaborator

@xuguruogu xuguruogu commented Jul 31, 2020

Use a new kind of cache instead of LRU. :

  1. LRU is not efficient for small objects.
  2. TTL is needed except cache coherence is guaranteed.
  3. Similar to cuckoo hash, use 64K spinlock as concurrency control.
  4. Reject to insert to the cache if full is enough for most cases.

The cuckoo hash cache is copied from the cuckoo hash map, removing rehash and cuckoo_run procedures, and act as a cache.

@xuguruogu
Copy link
Collaborator Author

With the help of cuckoo cache and cache non-exist tags, I achieved a larger performance improvement. I can not even count out the improvement factors.

Lock collision of LRU cache and cache miss(including non exist tag cache miss) rate are the main performance bottlenecks for now.

@xuguruogu
Copy link
Collaborator Author

With the cuckoo hash cache difference only, I achieved to reduce the costs mentioned in #2249 from 5min to less than 2min. Regarding the spark task management and preprocessing costs, I believe it can contribute more than 3X improvement.

@dangleptr
Copy link
Contributor

dangleptr commented Aug 3, 2020

Very good feature.
We'd better use an option to control which kind of cache used.
I will review it later. Please check the code style by the way. For example, the variable style is single camel.

@dangleptr
Copy link
Contributor

The cuckoo hash cache is copied from the cuckoo hash map, removing rehash and cuckoo_run procedures, and act as a cache.

We should add some comments about it.

// the bucket indices associated with the hash value and the current
// hashpower.
TwoBuckets snapshot_and_lock_two(const hash_value &hv) const {
while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why while(true) here?

slot = -1;
for (int i = 0; i < static_cast<int>(slot_per_bucket()); ++i) {
if (try_reclaim(bucket_ind, i)) {
slot = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return true here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your purpose.

public:
bucket() noexcept : occupied_() {}

const value_type &kvpair(size_t ind) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference withconst storage_value_type &storage_kvpair(size_t ind) const

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage_value_type is a std::pair with key const. It should not be visited outside.

@dangleptr
Copy link
Contributor

With the cuckoo hash cache difference only, I achieved to reduce the costs mentioned in #2249 from 5min to less than 2min. Regarding the spark task management and preprocessing costs, I believe it can contribute more than 3X improvement.

Ha, i think the performance comes from TTL feature, not just cuckoo hash cache.

@xuguruogu
Copy link
Collaborator Author

It acts like google sparse_hash_map, using continuous memory, go forward a limited space for insert and lookup. With no memory malloc if key/value is simple. With good spatial locality features.

@dangleptr
Copy link
Contributor

It acts like google sparse_hash_map, using continuous memory, go forward a limited space for insert and lookup. With no memory malloc if key/value is simple. With good spatial locality features.

Yes, we should satisfy different purposes with different kind cache. Let's talk about it offline.

slot = -1;
for (int i = 0; i < static_cast<int>(slot_per_bucket()); ++i) {
if (try_reclaim(bucket_ind, i)) {
slot = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your purpose.

if (try_reclaim(bucket_ind, i)) {
slot = i;
} else if (b.occupied(i)) {
if (!is_simple() && partial != b.partial(i)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could eliminate the if here. Because we could ensure is_simple at compile time.

@@ -0,0 +1,875 @@
#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#pragma once is not standard, please use include guard as other headers in our project.

Copy link
Contributor

@bright-starry-sky bright-starry-sky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that all functions and variables are naming rule differently than the general . right?

std::array<bool, SLOT_PER_BUCKET> occupied_;
};

bucket_container(size_t hp) : hashpower_(hp), buckets_(size()) { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to use explicit?

class bucket_container {
public:
using key_type = Key;
using mapped_type = T;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the usage of T?

@CLAassistant
Copy link

CLAassistant commented Sep 2, 2020

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

del_from_bucket(pos.index, pos.slot);
}
return true;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an else here.

@Sophie-Xie Sophie-Xie added the type/enhancement Type: make the code neat or more efficient label Sep 13, 2021
@Sophie-Xie Sophie-Xie added the community Source: who proposed the issue label Sep 27, 2021
@Sophie-Xie
Copy link
Contributor

This is an early PR based on v1, I will close it first. If it's necessary in the future, need to submit a new PR based on the master.

@Sophie-Xie Sophie-Xie closed this Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Source: who proposed the issue type/enhancement Type: make the code neat or more efficient
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants