Ristretto has these requirements:

Concurrent.
Memory-bounded (limit to configurable max memory usage).
Scale well as the number of cores and goroutines increase.
Scale well under non-random key access distribution (e.g. Zipf).
High cache hit ratio

There also an additional nice-to-have which is to minimize Go GC. Other caches like BigCache, FreeCache, etc. do this already.

See the blog post for more details.

Ristretto is divided up into a few parts:

Tiny-LFU
Memory Allocator
Cache using BP8 and Tiny-LFU.

Tiny-LFU

This can be adapted from Damian Gryski's implementation here. Work with Damian and Ben Manes to ensure that this implementation is correct and contains most low-hanging fruits.

Memory Allocator

Memory Allocator can be inspired by Google's TC Malloc. We allocate memory based on the size of the value in classes. Each class has pre-determined size and the value would take up a full slot in that class. The class keeps track of empty slots, potentially using bits and atomics to ensure threads are not acquiring locks to allocate or deallocate a slot. Note that Go doesn't expose thread-locality like C, so this is important.

There's plenty of material online about TCMalloc. See this.

Cache using BP-Wrapper

Initial version of the cache can just have a RW mutex lock. It can be based on a basic LRU algorithm initially. The main innovation would be to use BP-Wrapper to ensure that the cache performs great under load.

We can optimize this and do benchmarks. Once those show clear benefits, we can then integrate Tiny-LFU to increase the hit ratios and memory allocator to decrease Go GC time spent on the cache.

As per Ben:

I use striped & lossy ring buffer for reads, a bounded ring buffer for writes, schedule with a state machine, and guard executions with a try-lock. Since the LRU operations are performed under a non-blocking lock, the policy operations are simple and the buffers are optimized as multi-producer / single-consumer.

To implement these kind of buffers, an easy way in Go would be to use channels. We can further stripe them to divide up the accesses to avoid contention.

A lossy ring buffer can be a channel with a non-blocking push. A bounded ring buffer would be a channel with buffer and so on.

This would allow our reads to acquire read-only lock on the cache, while doing a best-effort push into one of the channels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design.md

design.md

Tiny-LFU

Memory Allocator

Cache using BP-Wrapper

Files

design.md

Latest commit

History

design.md

File metadata and controls

Tiny-LFU

Memory Allocator

Cache using BP-Wrapper