[runtime] Add an in-memory cache for Benchmark protos. #263

ChrisCummins · 2021-05-12T00:42:03Z

This will be used by the CompilationSession runtime to keep track of
the Benchmark protobufs that have been sent by the user to the
service, so that CompilationSession::init() can be passed a benchmark
proto.

This is a generalization of the BenchmarkFactory class that is used by
the LLVM service to keep a bunch of llvm::Modules loaded in
memory. The same class is implemented twice in C++ and Python using
the same semantics and with the same tests.

The cache has a target maximum size based on the number of bytes of
its elements. When this size is reached, benchamrks are evicted using
a random policy. The idea behind random cache eviction is that this
cache will be large enough by default to store a good number of
benchmarks, so exceeding the max cache size implies a training loop in
which random programs are selected from a very large pool, rather than
smaller pool where an LRU policy would be better.

Issue #254.

codecov-commenter · 2021-05-12T12:07:07Z

Codecov Report

Merging #263 (d8715f6) into development (05e56cb) will increase coverage by 0.13%.
The diff coverage is 91.78%.

@@               Coverage Diff               @@
##           development     #263      +/-   ##
===============================================
+ Coverage        83.32%   83.46%   +0.13%     
===============================================
  Files               76       78       +2     
  Lines             4353     4426      +73     
===============================================
+ Hits              3627     3694      +67     
- Misses             726      732       +6

Impacted Files	Coverage Δ
compiler_gym/service/compilation_session.py	`64.70% <64.70%> (ø)`
compiler_gym/service/__init__.py	`100.00% <100.00%> (ø)`
compiler_gym/service/runtime/benchmark_cache.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0aa1051...d8715f6. Read the comment docs.

hughleat

LGTM

compiler_gym/service/runtime/BenchmarkCache.h

hughleat · 2021-05-13T00:21:23Z

compiler_gym/service/runtime/BenchmarkCache.cc

+      maxSizeInBytes_(maxSizeInBytes),
+      sizeInBytes_(0){};
+
+const Benchmark* BenchmarkCache::get(const std::string& uri) const {


Consider (but not for long, you know what I'm like):

class BenchmarkCache { public: BenchmarkCache(std::function<Benchmark(string url)> benchmarkCreatorThingie, ...)... ... Benchmark& operator[] (string url) { if (url in cache) return cache[url]; auto b = benchmarkCreatorThingie(url); // May throw std::out_of_range like std::vector.at if (doesn't have capacity) { evictUntilCapacity(b.size()); } cache[url] = b; return b; } }```

I don't know if that makes sense for this use case because there is no std::function<Benchmark(string url)> benchmarkCreatorThingie callback that would make sense. If the benchmark can't be found, we return an error to the frontend. Pseudo-code:

class CompilerGymRuntime: def session_rpc_endpoint(self, benchmark_uri): if benchmark_uri not in self.benchmark_cache: return ErrorCode.NOT_FOUND benchmark = self.benchmark_cache[benchmark_uri] ...

hughleat · 2021-05-13T00:25:35Z

compiler_gym/service/runtime/BenchmarkCache.cc

+  targetSize = targetSize.has_value() ? targetSize : maxSizeInBytes() / 2;
+
+  while (size() && sizeInBytes() > targetSize) {
+    // Select a benchmark randomly.


Ignorable: I always thought the advantages of random over lru were that random doesn't need to keep the recently used list. If the data items are small that makes sense. Otherwise, isn't lru pretty much always better?

I figured that there are two use common cases for this cache: (1) in a tight loop over a couple hundred benchmarks that will all fit in cache (2) over a massive set of training programs where the chance of a cache hit is negligible.

Given that, it didn't seem to me like LRU would be much of an advantage. I also found this interesting: "A random eviction policy degrades gracefully as the loop gets too big." from https://danluu.com/2choices-eviction/

Disclaimer: I know nothing about cache eviction policies and have utterly no idea what I'm talking about :)

Incorporate reviewer feedback on facebookresearch#263.

This will be used by the CompilationSession runtime to keep track of the Benchmark protobufs that have been sent by the user to the service, so that CompilationSession::init() can be passed a benchmark proto. This is a generalization of the BenchmarkFactory class that is used by the LLVM service to keep a bunch of llvm::Modules loaded in memory. The same class is implemented twice in C++ and Python using the same semantics and with the same tests. The cache has a target maximum size based on the number of bytes of its elements. When this size is reached, benchamrks are evicted using a random policy. The idea behind random cache eviction is that this cache will be large enough by default to store a good number of benchmarks, so exceeding the max cache size implies a training loop in which random programs are selected from a very large pool, rather than smaller pool where an LRU policy would be better. Issue facebookresearch#254.

Incorporate reviewer feedback on facebookresearch#263.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2021

ChrisCummins force-pushed the benchmark-cache branch 2 times, most recently from 809ab76 to aa5fe5a Compare May 12, 2021 11:07

hughleat approved these changes May 13, 2021

View reviewed changes

ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request May 13, 2021

[runtime] BenchmarkCache PR feedback.

be24e9b

Incorporate reviewer feedback on facebookresearch#263.

ChrisCummins added 3 commits May 13, 2021 08:07

[service] Add a test for benchmark cache replacement.

249aff3

[runtime] BenchmarkCache PR feedback.

d8715f6

Incorporate reviewer feedback on facebookresearch#263.

ChrisCummins force-pushed the benchmark-cache branch from be24e9b to d8715f6 Compare May 13, 2021 15:07

ChrisCummins merged commit 8aacc9c into facebookresearch:development May 13, 2021

ChrisCummins deleted the benchmark-cache branch May 13, 2021 16:18

bwasti pushed a commit to bwasti/CompilerGym that referenced this pull request Aug 3, 2021

[runtime] BenchmarkCache PR feedback.

4275263

Incorporate reviewer feedback on facebookresearch#263.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[runtime] Add an in-memory cache for Benchmark protos. #263

[runtime] Add an in-memory cache for Benchmark protos. #263

ChrisCummins commented May 12, 2021

codecov-commenter commented May 12, 2021 •

edited

Loading

hughleat left a comment

hughleat May 13, 2021

ChrisCummins May 13, 2021

hughleat May 13, 2021

ChrisCummins May 13, 2021

[runtime] Add an in-memory cache for Benchmark protos. #263

[runtime] Add an in-memory cache for Benchmark protos. #263

Conversation

ChrisCummins commented May 12, 2021

codecov-commenter commented May 12, 2021 • edited Loading

Codecov Report

hughleat left a comment

Choose a reason for hiding this comment

hughleat May 13, 2021

Choose a reason for hiding this comment

ChrisCummins May 13, 2021

Choose a reason for hiding this comment

hughleat May 13, 2021

Choose a reason for hiding this comment

ChrisCummins May 13, 2021

Choose a reason for hiding this comment

codecov-commenter commented May 12, 2021 •

edited

Loading