Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix the ghost queue bug with s3fifo #432

Merged
merged 5 commits into from
Apr 27, 2024
Merged

Conversation

xiaguan
Copy link
Contributor

@xiaguan xiaguan commented Apr 26, 2024

What's changed and what's your intention?

Fixed up the ghost queue issue with s3fifo and also added some real workload tests. Another thing I wanna point out is that the algorithms that perform really well on zipf usually have lower robustness, meaning they're a bit more limited in where they can be applied.

Checklist

  • I have written the necessary rustdoc comments
  • I have added the necessary unit tests and integration tests
  • I have passed make all (or make fast instead if the old tests are not modified) in my local environment.

Related issues or PRs (optional)

@xiaguan xiaguan changed the title Fix s3fifo fix: fix the ghost queue bug with s3fifo Apr 26, 2024
@xiaguan
Copy link
Contributor Author

xiaguan commented Apr 26, 2024

Master

  cache_size                  fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
50000                     67.50%          70.51%          74.99%          70.88%          71.28%          64.70%
zif_exp, cache_size           fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
  0.90,  0.005                16.24%          19.21%          32.34%          32.14%          20.93%          33.43%
  0.90,   0.01                22.56%          26.23%          38.56%          39.24%          27.90%          37.91%
  0.90,   0.05                41.10%          45.61%          55.42%          56.68%          47.55%          55.31%
  0.90,    0.1                51.06%          55.70%          63.82%          65.24%          57.67%          64.17%
  0.90,   0.25                66.81%          71.18%          76.23%          77.62%          73.27%          77.15%
  1.00,  0.005                26.61%          31.06%          44.13%          44.49%          33.33%          45.64%
  1.00,   0.01                34.41%          39.18%          50.67%          51.26%          41.14%          50.79%
  1.00,   0.05                54.05%          58.77%          66.81%          67.96%          60.56%          66.75%
  1.00,    0.1                63.16%          67.63%          73.94%          75.00%          69.41%          74.42%
  1.00,   0.25                76.15%          79.92%          83.59%          84.53%          81.50%          84.27%
  1.05,  0.005                32.65%          37.71%          50.27%          50.25%          39.80%          51.78%
  1.05,   0.01                40.97%          46.10%          56.76%          57.48%          47.91%          57.13%
  1.05,   0.05                60.45%          65.07%          72.04%          73.07%          66.82%          72.36%
  1.05,    0.1                68.95%          73.13%          78.51%          79.45%          74.61%          78.98%
  1.05,   0.25                80.43%          83.79%          86.84%          87.53%          85.09%          87.48%
  1.10,  0.005                38.98%          44.46%          56.24%          56.38%          46.54%          57.86%
  1.10,   0.01                47.61%          52.93%          62.62%          63.55%          54.66%          63.11%
  1.10,   0.05                66.56%          70.91%          76.91%          77.83%          72.42%          77.18%
  1.10,    0.1                74.23%          78.05%          82.52%          83.24%          79.35%          82.99%
  1.10,   0.25                84.17%          87.08%          89.56%          90.09%          88.16%          90.08%
  1.50,  0.005                81.18%          85.29%          88.90%          89.30%          86.02%          89.92%
  1.50,   0.01                86.93%          89.89%          92.26%          92.69%          90.42%          92.79%
  1.50,   0.05                94.77%          96.05%          96.96%          97.08%          96.30%          97.10%
  1.50,    0.1                96.66%          97.52%          98.07%          98.09%          97.69%          98.16%
  1.50,   0.25                98.36%          98.82%          99.04%          99.02%          98.91%          99.09%

After fix

  cache_size                  fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
50000                     67.50%          70.51%          74.99%          70.88%          72.33%          64.70%
zif_exp, cache_size           fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
  0.90,  0.005                16.24%          19.20%          32.38%          32.06%          31.94%          33.44%
  0.90,   0.01                22.55%          26.21%          38.56%          39.27%          38.46%          37.86%
  0.90,   0.05                41.10%          45.61%          55.41%          56.64%          55.37%          55.19%
  0.90,    0.1                51.05%          55.69%          63.81%          65.27%          63.61%          64.16%
  0.90,   0.25                66.76%          71.15%          76.17%          77.53%          75.68%          77.11%
  1.00,  0.005                26.59%          31.05%          44.11%          44.37%          43.54%          45.54%
  1.00,   0.01                34.36%          39.13%          50.64%          51.40%          50.59%          50.69%
  1.00,   0.05                54.03%          58.75%          66.80%          67.94%          66.81%          66.91%
  1.00,    0.1                63.16%          67.62%          73.93%          75.01%          73.83%          74.38%
  1.00,   0.25                76.17%          79.93%          83.61%          84.53%          83.27%          84.36%
  1.05,  0.005                32.64%          37.67%          50.23%          50.31%          49.63%          51.83%
  1.05,   0.01                40.90%          46.02%          56.71%          57.59%          56.67%          56.99%
  1.05,   0.05                60.44%          65.03%          72.05%          73.02%          72.09%          72.31%
  1.05,    0.1                68.90%          73.10%          78.50%          79.36%          78.39%          78.98%
  1.05,   0.25                80.34%          83.71%          86.77%          87.47%          86.49%          87.40%
  1.10,  0.005                38.98%          44.45%          56.25%          56.62%          55.62%          57.89%
  1.10,   0.01                47.66%          52.98%          62.65%          63.55%          62.62%          63.22%
  1.10,   0.05                66.55%          70.91%          76.90%          77.77%          76.96%          77.22%
  1.10,    0.1                74.27%          78.07%          82.56%          83.38%          82.51%          82.97%
  1.10,   0.25                84.19%          87.10%          89.56%          90.08%          89.34%          90.08%
  1.50,  0.005                81.20%          85.30%          88.90%          89.32%          88.79%          89.92%
  1.50,   0.01                86.91%          89.87%          92.25%          92.66%          92.29%          92.76%
  1.50,   0.05                94.77%          96.05%          96.96%          97.08%          96.96%          97.10%
  1.50,    0.1                96.64%          97.50%          98.05%          98.10%          98.04%          98.12%
  1.50,   0.25                98.37%          98.82%          99.05%          99.03%          99.03%          99.10%

Copy link

codecov bot commented Apr 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.82%. Comparing base (e612b5a) to head (3a561d6).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #432      +/-   ##
==========================================
+ Coverage   80.63%   80.82%   +0.19%     
==========================================
  Files          53       53              
  Lines        7173     7166       -7     
==========================================
+ Hits         5784     5792       +8     
+ Misses       1389     1374      -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xiaguan xiaguan mentioned this pull request Apr 26, 2024
3 tasks
Copy link
Collaborator

@MrCroxx MrCroxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the efforts. 🥰

This PR generally LGTM. The following comments should be resolved.

Could you please provide where the idea comes from?

I rechecked the original paper of S3Fifo, and found its idea is not like either of our implements. 🥲

image

foyer-memory/Cargo.toml Outdated Show resolved Hide resolved
Comment on lines +246 to +250
// Try to read the csv file path by environment variable.
let path = std::env::var("TWITTER_TRACE_PATH").ok();
if let Some(path) = path {
// Limit the number of keys to read.
// MAX means read all keys, which may take a really long time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an interactive command to download the dataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download this is a little bit complex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download this is a little bit complex.

Hi @xiaguan , where can I find the dataset? Let me handle the download part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/twitter/cache-trace, I think I used miss ratio related cluster's data.
I use SNIA to download. In this way, I could just download part of it(6GB).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it is really big. 🤣

foyer-memory/src/eviction/s3fifo.rs Outdated Show resolved Hide resolved
foyer-memory/src/eviction/s3fifo.rs Outdated Show resolved Hide resolved
@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

Just attach some discussion of foyer with S3FIFO.

#306

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

Master

  cache_size                  fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
50000                     67.50%          70.51%          74.99%          70.88%          71.28%          64.70%
zif_exp, cache_size           fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
  0.90,  0.005                16.24%          19.21%          32.34%          32.14%          20.93%          33.43%
  0.90,   0.01                22.56%          26.23%          38.56%          39.24%          27.90%          37.91%
  0.90,   0.05                41.10%          45.61%          55.42%          56.68%          47.55%          55.31%
  0.90,    0.1                51.06%          55.70%          63.82%          65.24%          57.67%          64.17%
  0.90,   0.25                66.81%          71.18%          76.23%          77.62%          73.27%          77.15%
  1.00,  0.005                26.61%          31.06%          44.13%          44.49%          33.33%          45.64%
  1.00,   0.01                34.41%          39.18%          50.67%          51.26%          41.14%          50.79%
  1.00,   0.05                54.05%          58.77%          66.81%          67.96%          60.56%          66.75%
  1.00,    0.1                63.16%          67.63%          73.94%          75.00%          69.41%          74.42%
  1.00,   0.25                76.15%          79.92%          83.59%          84.53%          81.50%          84.27%
  1.05,  0.005                32.65%          37.71%          50.27%          50.25%          39.80%          51.78%
  1.05,   0.01                40.97%          46.10%          56.76%          57.48%          47.91%          57.13%
  1.05,   0.05                60.45%          65.07%          72.04%          73.07%          66.82%          72.36%
  1.05,    0.1                68.95%          73.13%          78.51%          79.45%          74.61%          78.98%
  1.05,   0.25                80.43%          83.79%          86.84%          87.53%          85.09%          87.48%
  1.10,  0.005                38.98%          44.46%          56.24%          56.38%          46.54%          57.86%
  1.10,   0.01                47.61%          52.93%          62.62%          63.55%          54.66%          63.11%
  1.10,   0.05                66.56%          70.91%          76.91%          77.83%          72.42%          77.18%
  1.10,    0.1                74.23%          78.05%          82.52%          83.24%          79.35%          82.99%
  1.10,   0.25                84.17%          87.08%          89.56%          90.09%          88.16%          90.08%
  1.50,  0.005                81.18%          85.29%          88.90%          89.30%          86.02%          89.92%
  1.50,   0.01                86.93%          89.89%          92.26%          92.69%          90.42%          92.79%
  1.50,   0.05                94.77%          96.05%          96.96%          97.08%          96.30%          97.10%
  1.50,    0.1                96.66%          97.52%          98.07%          98.09%          97.69%          98.16%
  1.50,   0.25                98.36%          98.82%          99.04%          99.02%          98.91%          99.09%

After fix

  cache_size                  fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
50000                     67.50%          70.51%          74.99%          70.88%          72.33%          64.70%
zif_exp, cache_size           fifo            lru             lfu             s3fifo (0g)     s3fifo (1g)     moka            
  0.90,  0.005                16.24%          19.20%          32.38%          32.06%          31.94%          33.44%
  0.90,   0.01                22.55%          26.21%          38.56%          39.27%          38.46%          37.86%
  0.90,   0.05                41.10%          45.61%          55.41%          56.64%          55.37%          55.19%
  0.90,    0.1                51.05%          55.69%          63.81%          65.27%          63.61%          64.16%
  0.90,   0.25                66.76%          71.15%          76.17%          77.53%          75.68%          77.11%
  1.00,  0.005                26.59%          31.05%          44.11%          44.37%          43.54%          45.54%
  1.00,   0.01                34.36%          39.13%          50.64%          51.40%          50.59%          50.69%
  1.00,   0.05                54.03%          58.75%          66.80%          67.94%          66.81%          66.91%
  1.00,    0.1                63.16%          67.62%          73.93%          75.01%          73.83%          74.38%
  1.00,   0.25                76.17%          79.93%          83.61%          84.53%          83.27%          84.36%
  1.05,  0.005                32.64%          37.67%          50.23%          50.31%          49.63%          51.83%
  1.05,   0.01                40.90%          46.02%          56.71%          57.59%          56.67%          56.99%
  1.05,   0.05                60.44%          65.03%          72.05%          73.02%          72.09%          72.31%
  1.05,    0.1                68.90%          73.10%          78.50%          79.36%          78.39%          78.98%
  1.05,   0.25                80.34%          83.71%          86.77%          87.47%          86.49%          87.40%
  1.10,  0.005                38.98%          44.45%          56.25%          56.62%          55.62%          57.89%
  1.10,   0.01                47.66%          52.98%          62.65%          63.55%          62.62%          63.22%
  1.10,   0.05                66.55%          70.91%          76.90%          77.77%          76.96%          77.22%
  1.10,    0.1                74.27%          78.07%          82.56%          83.38%          82.51%          82.97%
  1.10,   0.25                84.19%          87.10%          89.56%          90.08%          89.34%          90.08%
  1.50,  0.005                81.20%          85.30%          88.90%          89.32%          88.79%          89.92%
  1.50,   0.01                86.91%          89.87%          92.25%          92.66%          92.29%          92.76%
  1.50,   0.05                94.77%          96.05%          96.96%          97.08%          96.96%          97.10%
  1.50,    0.1                96.64%          97.50%          98.05%          98.10%          98.04%          98.12%
  1.50,   0.25                98.37%          98.82%          99.05%          99.03%          99.03%          99.10%

And could you please sync this result to the comments in the bench_hit_ratio.rs file? 🥰

@xiaguan
Copy link
Contributor Author

xiaguan commented Apr 27, 2024

Could you please provide where the idea comes from?

The implementation on the master branch doesn't evict objects from the ghost queue. I mainly focused on fixing this bug.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

Could you please provide where the idea comes from?

The implementation on the master branch doesn't evict objects from the ghost queue. I mainly focused on fixing this bug.

Oh, hack. You are right. 🥲 I just decreased the reference count. BTW, how do you like the reference count? I see you removed it.

@MrCroxx MrCroxx added the bug Something isn't working label Apr 27, 2024
@MrCroxx MrCroxx added this to the v0.8 milestone Apr 27, 2024
@xiaguan
Copy link
Contributor Author

xiaguan commented Apr 27, 2024

BTW, how do you like the reference count? I see you removed it.

I just follow the paper. The paper doesn't maintain the reference count in ghost queue?

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

Could you please provide where the idea comes from?

The implementation on the master branch doesn't evict objects from the ghost queue. I mainly focused on fixing this bug.

Okay. Let me check then. I will update this branch directly if necessary and merge it.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

Could you please provide where the idea comes from?

The implementation on the master branch doesn't evict objects from the ghost queue. I mainly focused on fixing this bug.

Emmm, wait again. The implement on the main branch does evict objects from the ghost queue. It evicts only when the reference count hits 0.

This PR, on the other hand, removes all keys with the same no matter if there is a hash collision. This behavior will make the admission less aggressive.

I got you.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

Hi @PsiACE , do you have any idea about the reference count? Glad to hear from you as another reference.

@PsiACE
Copy link

PsiACE commented Apr 27, 2024

Hi @PsiACE , do you have any idea about the reference count? Glad to hear from you as another reference.

s3fifo does not need to keep reference counting for ghost, as reference counting will cause Quick Demotion failure, that is, incorrect estimation of popularity.

After I noticed this and made the corresponding modifications, my algorithm also approached the state-of-the-art level, but it still needs further optimization.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

s3fifo does not need to keep reference counting for ghost, as reference counting will cause Quick Demotion failure, that is, incorrect estimation of popularity.

Even when the ghost queue uses key hash not key itself? 🤔 This looks a bit counterintuitive.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

I plan to rewrite a cache benchmark tool like mokabench or enhance it to make benchmarking more automatic.

This PR fixed that the ghost queue doesn't add weights when pushing, but removes the hash reference count. I'm kind of concerned about removing the hash reference count, but for most workloads, I think it is okay. So let's merge it first.

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

Thanks again for your efforts, @xiaguan and @PsiACE . I've learnt a lot in the journey.

@MrCroxx MrCroxx merged commit 4033b00 into foyer-rs:main Apr 27, 2024
16 checks passed
@PsiACE
Copy link

PsiACE commented Apr 27, 2024

I plan to rewrite a cache benchmark tool like mokabench or enhance it to make benchmarking more automatic.

It can be almost directly reused.

#[derive(Clone)]
pub struct S3FIFOCache {
    config: Arc<Config>,
    cache: Arc<foyer::Cache<Key, Value>>,
}

impl S3FIFOCache {
    pub fn new(config: &Config, capacity: usize) -> Self {
        if let Some(_ttl) = config.ttl {
            todo!()
        }
        if let Some(_tti) = config.tti {
            todo!()
        }
        if config.size_aware {
            todo!()
        }

        // TinyUFO does not support custom hasher. Use its default hasher.
        Self {
            config: Arc::new(config.clone()),
            cache: Arc::new(foyer::CacheBuilder::new(capacity)
            .with_shards(1)
            .with_eviction_config(foyer::S3FifoConfig {
                small_queue_capacity_ratio: 0.1,
                ghost_queue_capacity_ratio: 0.9,
                small_to_main_freq_threshold: 2,
            })
            .with_object_pool_capacity(16)
            .build()),
        }
    }

    fn get(&self, key: &usize) -> bool {
        self.cache.get(key).is_some()
    }

    fn insert(&self, key: usize, req_id: usize) {
        let value = super::make_value(&self.config, key, req_id);
        super::sleep_thread_for_insertion(&self.config);
        self.cache.insert(key, value);
    }
}

@MrCroxx
Copy link
Collaborator

MrCroxx commented Apr 27, 2024

I plan to rewrite a cache benchmark tool like mokabench or enhance it to make benchmarking more automatic.

It can be almost directly reused.

#[derive(Clone)]
pub struct S3FIFOCache {
    config: Arc<Config>,
    cache: Arc<foyer::Cache<Key, Value>>,
}

impl S3FIFOCache {
    pub fn new(config: &Config, capacity: usize) -> Self {
        if let Some(_ttl) = config.ttl {
            todo!()
        }
        if let Some(_tti) = config.tti {
            todo!()
        }
        if config.size_aware {
            todo!()
        }

        // TinyUFO does not support custom hasher. Use its default hasher.
        Self {
            config: Arc::new(config.clone()),
            cache: Arc::new(foyer::CacheBuilder::new(capacity)
            .with_shards(1)
            .with_eviction_config(foyer::S3FifoConfig {
                small_queue_capacity_ratio: 0.1,
                ghost_queue_capacity_ratio: 0.9,
                small_to_main_freq_threshold: 2,
            })
            .with_object_pool_capacity(16)
            .build()),
        }
    }

    fn get(&self, key: &usize) -> bool {
        self.cache.get(key).is_some()
    }

    fn insert(&self, key: usize, req_id: usize) {
        let value = super::make_value(&self.config, key, req_id);
        super::sleep_thread_for_insertion(&self.config);
        self.cache.insert(key, value);
    }
}

Yep. I've already sent a PR 🥰

@xiaguan xiaguan deleted the fix_s3fifo branch April 29, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants