Ristretto has these requirements:
- Concurrent.
- Memory-bounded (limit to configurable max memory usage).
- Scale well as the number of cores and goroutines increase.
- Scale well under non-random key access distribution (e.g. Zipf).
- High cache hit ratio
There also an additional nice-to-have which is to minimize Go GC. Other caches like BigCache, FreeCache, etc. do this already.
See the blog post for more details.
Ristretto is divided up into a few parts:
- Tiny-LFU
- Memory Allocator
- Cache using BP8 and Tiny-LFU.
This can be adapted from Damian Gryski's implementation here. Work with Damian and Ben Manes to ensure that this implementation is correct and contains most low-hanging fruits.
Memory Allocator can be inspired by Google's TC Malloc. We allocate memory based on the size of the value in classes. Each class has pre-determined size and the value would take up a full slot in that class. The class keeps track of empty slots, potentially using bits and atomics to ensure threads are not acquiring locks to allocate or deallocate a slot. Note that Go doesn't expose thread-locality like C, so this is important.
There's plenty of material online about TCMalloc. See this.
Initial version of the cache can just have a RW mutex lock. It can be based on a basic LRU algorithm initially. The main innovation would be to use BP-Wrapper to ensure that the cache performs great under load.
We can optimize this and do benchmarks. Once those show clear benefits, we can then integrate Tiny-LFU to increase the hit ratios and memory allocator to decrease Go GC time spent on the cache.
As per Ben:
I use striped & lossy ring buffer for reads, a bounded ring buffer for writes, schedule with a state machine, and guard executions with a try-lock. Since the LRU operations are performed under a non-blocking lock, the policy operations are simple and the buffers are optimized as multi-producer / single-consumer.
To implement these kind of buffers, an easy way in Go would be to use channels. We can further stripe them to divide up the accesses to avoid contention.
A lossy ring buffer can be a channel with a non-blocking push. A bounded ring buffer would be a channel with buffer and so on.
This would allow our reads to acquire read-only lock on the cache, while doing a best-effort push into one of the channels.