-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pack pipeline cache contents using tar/7z #10925
Comments
I've tried to manually pack and unpack cache files to avoid #10841, but strangely the
but its file attributes say that it is a file ( |
@jneira - to make it easier to test the performance of tar/zip, I created pipeline step templates that handle tar/untarring files cached with the Feel free to give it a try. If you run into a problem, please report it at willsmythe/caching-templates. Disclaimer: this is not an official solution from Microsoft. It simply wraps the Microsoft-provided Alternatively, point me to your repo and I can take a look ... |
@willsmythe thanks! Actually i am implementing the template steps manually (afaiu, tar the original folder, put the tar file in other folder, and cache the last one) so maybe i'll give a try |
Finally i've being able to cache the tar files with bash script steps. As i only need packing to workaround temporary #10841 only for linux and macos i will keep the manual hack for now. Thanks anyway @willsmythe # .....
variables:
STACK_ROOT: /home/vsts/.stack
steps:
- task: CacheBeta@0
inputs:
key: |
"cache"
$(Agent.OS)
$(Build.SourcesDirectory)/$(YAML_FILE)
path: .azure-cache
cacheHitVar: CACHE_RESTORED
displayName: "Download cache"
- bash: |
mkdir -p $STACK_ROOT
tar -xzf .azure-cache/stack-root.tar.gz -C /
mkdir -p .stack-work
tar -xzf .azure-cache/stack-work.tar.gz
displayName: "Unpack cache"
condition: eq(variables.CACHE_RESTORED, 'true')
# ....
- bash: |
mkdir .azure-cache
tar -czf .azure-cache/stack-root.tar.gz $STACK_ROOT
tar -czf .azure-cache/stack-work.tar.gz .stack-work
displayName: "Pack cache" The final build cached is https://dev.azure.com/jneira/haskell-ide-engine/_build/results?buildId=179 |
I tried out 7z, zip, tar using the archive task on node_modules. For zip and tar the performance is worse, for 7zip is marginally better w/o extra task w/ tar (no compression) w/ 7zip w/ zip |
This feature is merged and will be available in the v2.157.0 agent, which should be rolling out everywhere this week. The functionality is currently "opt in" --- you need to set the IMPORTANT: this variable is only checked on "cache save", which only runs if needed (i.e. a cache entry with the same key doesn't already exist) and the build status is successful. On "cache restore", regardless of this variable's value, the cache's contents are untarred whenever the cache entry metadata indicates the contents are cached. |
We've been using 7z compression (as above) as its the best timings. However, the 7z task to compress when the cache fails is expensive and the problem is that we get cache misses all the time - we might get a cache hit and then 20 minutes later a cache fail.. and we end up compressing again and again on different pipelines for the exact same cache key and that in the end makes the builds alot slower that if we weren't 7z-ing. Is it worth me trying the built-in tar? is it likely quicker than my custom test above where I have two seperate job to untar/fetch from azure? |
@fadnavistanmay Close this out when we've deployed to all rings |
We're rolling out TARing as default with agent 160. If this is what you want, you can just remove this env var |
This is releases as part of the agent 2.160.0 Thanks |
Basic information
Question, bug, or feature? : Feature
Task name: CacheBeta/Cache
Environment
Hosted
Description
To improve cache restore/save performance especially for caches with a large number of small files (like
node_modules
), the Cache task should have built-in support for "packing" the cache contents, meaning consolidate all files for the specified "path" into a single file and only store this file in the cache on the server. Why?For performance reasons, "tar" should be used on Linux and macOS, and "7z" on Windows.
Turning on cache content packing
For now, the option for packing a cache's contents should be controlled via an environment variable (e.g.
AZDEVOPS_PIPELINECACHE_PACK
), with a decision coming later about whether to always pack or give developers the option (likely via an input on the task).Changes to the generated cache fingerprint
Since packing changes the actual contents of the cache (i.e. a single tar or 7z file versus many individual files), the task (technically the agent plugin) needs to append an appropriate segment to the developer-provided
key
to ensure a different fingerprint is produced (which logically makes sense since the cache's contents on the server are different from the "same" cache whose content wasn't packed). We should establish a "namespace" for these key segments injected by the task, and then define different key segments for the different pack formats, for example:microsoft.azure.pipelines.caching.pack=tar
(on posix)microsoft.azure.pipelines.caching.pack=7z
(on Windows)The naming convention for key segments follows the convention for Docker labels and gives us room to support other key segments in the future. Developers should be blocked from specifying key segments in this namespace.
All of this should be somewhat transparent to the developer (but should still be reported in the logs so developers understand why turning on/off pack impacts cache's identifer). Developers should continue to use variables like
$(Agent.OS)
in their cache key when they know the cache's contents are different for different OSes (and not just rely on the auto-injected pack key segment creating this differentiation).Runtime behavior
When caching packing is enabled ....
On restore
The task (technically the agent plugin) should append an appropriate key segment to the developer-provided key (and optional "restore keys") based on the preferred pack technology for the environment (tar on posix, 7z on Windows).
This generated fingerprint will then be looked up on the server as usual. If there is a cache hit, the downloaded contents will be appropriately unpacked and dropped into the developer-specified
path
.On save
Like during restore, the task should append an appropriate key segment based on the preferred pack technology. If a cache with this key doesn't already exist on the server, the task should appropriately pack the files in the specified
path
and upload this single file as the contents for the new cache.The text was updated successfully, but these errors were encountered: