-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache an entire repository to increase cache hit rate #1098
base: main
Are you sure you want to change the base?
Conversation
Motivation: The size of the repositories in Central Dogma are relatively small. Therefore, caching the entire repo rather than caching each file separately would result in a higher cache hit rate. Additionally, when checking repository access patterns internally, it was found that a client tends to send multiple queries to fully scan repository files instead of using the all path pattern `/**`. Modifications: - Changed `CachingRepository` to cache all files in a repository when finding files with a path pattern. Filtering is performed to the cached entries. - Doubled the default repository cache spec to approximately 256-megachars. - Strip `/` if a matching path starts with it to apply `PathPatternFilter` to `Entry.path()` Result: It increases cache hits through repo-level cache and allows the server to quickly fill the cache when starting up.
cacheableOptions = newOptions.build(); | ||
} | ||
|
||
return cache.get(new CacheableFindCall(repo, normalizedRevision, ALL_PATH, cacheableOptions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood worst case, the cached size can be (number of revisions) * (size of directory)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible. Based on the user pattern I observed, most clients were only interested in HEAD revision. Old revisions were accessed occasionally. The practical cached size would be (1 or 2 revisions) * (size of directory)
.
|
||
// Use LinkedHashMap to 1) keep the order and 2) allow callers to mutate it. | ||
return stream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, | ||
(oldV, newV) -> oldV, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) Is this just a precaution? or is it actually possible that multiple values (files) exist for a single key (path)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(oldV, newV) -> oldV
was just added because of the signature of Collectors.toMap()
.
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function-java.util.function.BinaryOperator-java.util.function.Supplier-
To use LinkedHashMap
, mergeFunction
is necessary.
server/src/main/java/com/linecorp/centraldogma/server/storage/repository/Repository.java
Outdated
Show resolved
Hide resolved
…repository/Repository.java Co-authored-by: jrhee17 <guins_j@guins.org>
Motivation:
The size of the repositories in Central Dogma are relatively small. Therefore, caching the entire repo rather than caching each file separately would result in a higher cache hit rate. Additionally, when checking repository access patterns internally, it was found that a client tends to send multiple queries to fully scan repository files instead of using the all path pattern
/**
.Modifications:
CachingRepository
to cache all files in a repository when finding files with a path pattern. Filtering is performed to the cached entries./
if a matching path starts with it to applyPathPatternFilter
toEntry.path()
Result:
It increases cache hits through repo-level cache and allows the server to quickly fill the cache when starting up.