Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache an entire repository to increase cache hit rate #1098

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ikhoon
Copy link
Contributor

@ikhoon ikhoon commented Feb 12, 2025

Motivation:

The size of the repositories in Central Dogma are relatively small. Therefore, caching the entire repo rather than caching each file separately would result in a higher cache hit rate. Additionally, when checking repository access patterns internally, it was found that a client tends to send multiple queries to fully scan repository files instead of using the all path pattern /**.

Modifications:

  • Changed CachingRepository to cache all files in a repository when finding files with a path pattern. Filtering is performed to the cached entries.
  • Doubled the default repository cache spec to approximately 256-megachars.
  • Strip / if a matching path starts with it to apply PathPatternFilter to Entry.path()

Result:

It increases cache hits through repo-level cache and allows the server to quickly fill the cache when starting up.

Motivation:

The size of the repositories in Central Dogma are relatively small.
Therefore, caching the entire repo rather than caching each file
separately would result in a higher cache hit rate.
Additionally, when checking repository access patterns internally, it
was found that a client tends to send multiple queries to fully scan
repository files instead of using the all path pattern `/**`.

Modifications:

- Changed `CachingRepository` to cache all files in a repository when
  finding files with a path pattern. Filtering is performed to the cached
  entries.
- Doubled the default repository cache spec to approximately 256-megachars.
- Strip `/` if a matching path starts with it to apply
  `PathPatternFilter` to `Entry.path()`

Result:

It increases cache hits through repo-level cache and allows the server
to quickly fill the cache when starting up.
@ikhoon ikhoon added this to the 0.74.0 milestone Feb 12, 2025
@ikhoon ikhoon marked this pull request as draft February 12, 2025 14:41
@ikhoon ikhoon marked this pull request as ready for review February 18, 2025 02:23
cacheableOptions = newOptions.build();
}

return cache.get(new CacheableFindCall(repo, normalizedRevision, ALL_PATH, cacheableOptions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood worst case, the cached size can be (number of revisions) * (size of directory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible. Based on the user pattern I observed, most clients were only interested in HEAD revision. Old revisions were accessed occasionally. The practical cached size would be (1 or 2 revisions) * (size of directory).


// Use LinkedHashMap to 1) keep the order and 2) allow callers to mutate it.
return stream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue,
(oldV, newV) -> oldV,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Is this just a precaution? or is it actually possible that multiple values (files) exist for a single key (path)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oldV, newV) -> oldV was just added because of the signature of Collectors.toMap().
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function-java.util.function.BinaryOperator-java.util.function.Supplier-

To use LinkedHashMap, mergeFunction is necessary.

…repository/Repository.java

Co-authored-by: jrhee17 <guins_j@guins.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants