Cache an entire repository to increase cache hit rate #1098

ikhoon · 2025-02-12T14:41:08Z

Motivation:

The size of the repositories in Central Dogma are relatively small. Therefore, caching the entire repo rather than caching each file separately would result in a higher cache hit rate. Additionally, when checking repository access patterns internally, it was found that a client tends to send multiple queries to fully scan repository files instead of using the all path pattern /**.

Modifications:

Changed CachingRepository to cache all files in a repository when finding files with a path pattern. Filtering is performed to the cached entries.
Doubled the default repository cache spec to approximately 256-megachars.
Strip / if a matching path starts with it to apply PathPatternFilter to Entry.path()

Result:

It increases cache hits through repo-level cache and allows the server to quickly fill the cache when starting up.

Motivation: The size of the repositories in Central Dogma are relatively small. Therefore, caching the entire repo rather than caching each file separately would result in a higher cache hit rate. Additionally, when checking repository access patterns internally, it was found that a client tends to send multiple queries to fully scan repository files instead of using the all path pattern `/**`. Modifications: - Changed `CachingRepository` to cache all files in a repository when finding files with a path pattern. Filtering is performed to the cached entries. - Doubled the default repository cache spec to approximately 256-megachars. - Strip `/` if a matching path starts with it to apply `PathPatternFilter` to `Entry.path()` Result: It increases cache hits through repo-level cache and allows the server to quickly fill the cache when starting up.

jrhee17 · 2025-02-20T08:14:55Z

...va/com/linecorp/centraldogma/server/internal/storage/repository/cache/CachingRepository.java

+            cacheableOptions = newOptions.build();
+        }
+
+        return cache.get(new CacheableFindCall(repo, normalizedRevision, ALL_PATH, cacheableOptions))


I understood worst case, the cached size can be (number of revisions) * (size of directory).

Possible. Based on the user pattern I observed, most clients were only interested in HEAD revision. Old revisions were accessed occasionally. The practical cached size would be (1 or 2 revisions) * (size of directory).

jrhee17 · 2025-02-20T08:15:25Z

...va/com/linecorp/centraldogma/server/internal/storage/repository/cache/CachingRepository.java

+
+                        // Use LinkedHashMap to 1) keep the order and 2) allow callers to mutate it.
+                        return stream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue,
+                                                               (oldV, newV) -> oldV,


Question) Is this just a precaution? or is it actually possible that multiple values (files) exist for a single key (path)?

(oldV, newV) -> oldV was just added because of the signature of Collectors.toMap().
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function-java.util.function.BinaryOperator-java.util.function.Supplier-

To use LinkedHashMap, mergeFunction is necessary.

server/src/main/java/com/linecorp/centraldogma/server/storage/repository/Repository.java

…repository/Repository.java Co-authored-by: jrhee17 <guins_j@guins.org>

ikhoon added the improvement label Feb 12, 2025

ikhoon added this to the 0.74.0 milestone Feb 12, 2025

ikhoon requested review from jrhee17, minwoox and trustin as code owners February 12, 2025 14:41

ikhoon marked this pull request as draft February 12, 2025 14:41

ikhoon added 2 commits February 13, 2025 14:35

Merge branch 'main' into improve-cache-hit

952f840

fix bugs

0fd8da9

ikhoon marked this pull request as ready for review February 18, 2025 02:23

jrhee17 approved these changes Feb 20, 2025

View reviewed changes

Update server/src/main/java/com/linecorp/centraldogma/server/storage/…

13fc450

…repository/Repository.java Co-authored-by: jrhee17 <guins_j@guins.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache an entire repository to increase cache hit rate #1098

Cache an entire repository to increase cache hit rate #1098

ikhoon commented Feb 12, 2025

jrhee17 Feb 20, 2025

ikhoon Feb 20, 2025

jrhee17 Feb 20, 2025

ikhoon Feb 20, 2025

Cache an entire repository to increase cache hit rate #1098

Are you sure you want to change the base?

Cache an entire repository to increase cache hit rate #1098

Conversation

ikhoon commented Feb 12, 2025

jrhee17 Feb 20, 2025

Choose a reason for hiding this comment

ikhoon Feb 20, 2025

Choose a reason for hiding this comment

jrhee17 Feb 20, 2025

Choose a reason for hiding this comment

ikhoon Feb 20, 2025

Choose a reason for hiding this comment