Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Cache Task #9190

Closed
kapilt opened this issue Dec 27, 2018 · 14 comments
Closed

Build Cache Task #9190

kapilt opened this issue Dec 27, 2018 · 14 comments
Assignees
Labels
Area: ArtifactsPackages Azure Artifacts Packaging Team enhancement

Comments

@kapilt
Copy link

kapilt commented Dec 27, 2018

Its very common for npm, pip, and others to use a build cache to speed up installations and time to first test execution. For many hosted ci platforms this is accomplished via a plugin that stores/retrieves cache artifacts from a cloud provider object storage. Ideally an azure pipeline integration would work the same so it works regardless of hosted/byoa execution runtimes.

There have been numerous requests over the years for the same.

https://visualstudio.uservoice.com/forums/330519-azure-devops-formerly-visual-studio-team-services/suggestions/32044321-improve-hosted-build-agent-performance-with-build

https://feedback.azure.com/forums/169382-cache/suggestions/35604928-support-caching-in-azure-pipelines

#7254

#553

@mitchdenny
Copy link
Member

Thanks for raising this issue @kapilt. We are actively looking into a build caching solution for Azure Pipelines, we have done some initial design work and hope to start work on it in the new year.

davidebeatrici added a commit to davidebeatrici/mumble that referenced this issue Jan 15, 2019
Azure Pipelines offers many advantages over AppVeyor, with the biggest one being a maximum of 10 parallel builds for open-source projects, drastically reducing the builds completion time.

A pipeline takes about the same time to build on the two CI infrastructures, however the caching feature is currently only available on AppVeyor, meaning that on Azure Pipelines the compressed environment is downloaded for each build consuming about 10 minutes.

This means that we'll save at least 10 minutes per pipeline compared to AppVeyor once the feature is added to Azure Pipelines: microsoft/azure-pipelines-tasks#9190
davidebeatrici added a commit to davidebeatrici/mumble that referenced this issue Jan 15, 2019
Azure Pipelines offers many advantages over AppVeyor, with the biggest one being a maximum of 10 parallel builds for open-source projects, drastically reducing the builds completion time.

A pipeline takes about the same time to build on the two CI infrastructures, however the caching feature is currently only available on AppVeyor, meaning that on Azure Pipelines the compressed environment is downloaded for each build consuming about 10 minutes.

This means that we'll save at least 10 minutes per pipeline compared to AppVeyor once the feature is added to Azure Pipelines: microsoft/azure-pipelines-tasks#9190
@mitchdenny mitchdenny added the Area: ArtifactsPackages Azure Artifacts Packaging Team label Jan 24, 2019
@infin8x infin8x added Area: ArtifactsPackages Azure Artifacts Packaging Team and removed Area: ArtifactsPackages Azure Artifacts Packaging Team Area: Packaging labels Jan 25, 2019
@infin8x
Copy link

infin8x commented Jan 29, 2019

Thanks for all the feedback and interest. Since @mitchdenny’s last post, we’ve completed an initial design for Pipeline Caching, and we’re starting the implementation now. You can see how it’ll work and can leave comments in the PR. We look forward to your feedback and are excited to have you try it out once it’s ready.

@jjpepper
Copy link

I have a suggestion that there is somehow native support for vcpkg or an easy way to make vcpkg work. Vcpkg builds are a perfect candidate for the build cache. Try vcpkg install cgal :-) For some reason the ubuntu images have vcpkg installed but the Windows ones don't. Anyone from MSFT know why?

@mitchdenny
Copy link
Member

Adding @chrisrpatterson to comment on having vcpkg pre-baked on the Windows agent. For caching, we see native builds as a scenario we want to improve. The initial steps will be building blocks and we'll tackle serving individual ecosystems as we go along.

We've already had conversations with the vcpkg team :)

@brad-jones
Copy link

This is what I have built for our pipeline, uses an NPM Artifact feed. Bit of a hack but it works for now.

# Yaml Spec: https://aka.ms/yaml

# AzureDevOps Pipeline Caching
#
# Right now there is no out of box solution.
# see: https://github.com/Microsoft/azure-pipelines-tasks/issues/9190
#
# This home grown solution uses an npm registry as a central cache

parameters:

  # The name of artifact that you wish to retrieve from cache.
  # This becomes the NPM package name, therefore it must be
  # a valid npm package name.
  name: ''

  # A command to generate a hash key.
  # If the hash does not exist nothing will be downloaded.
  # By default we hash the provided gencmd.
  hashcmd: 'echo "${GENCMD}" | sha1sum'

  # A command to execute in the event of a cache miss
  gencmd: 'echo "Failed to provide gencmd" && exit 1;'

  # The directory to extract the cache into.
  # This can be any folder, if it does not exist we will create it.
  directory: $(Build.ArtifactStagingDirectory)

  # NPM Registry to connect to for storing cache artifacts
  npmRegistry: $(build.cache.npm.registry)
  npmToken: $(build.cache.npm.token)

steps:
  - bash: |
      set -Eeuo pipefail;
      OLD_CWD="$PWD";
      function installRc {
        echo "Installing build cache .npmrc config file";
        if [ -f ~/.npmrc ];
        then
          mv ~/.npmrc ~/.npmrc.bk;
        fi;
        echo "always-auth=true" >> ~/.npmrc;
        echo "registry=https://${NPM_REGISTRY}/" >> ~/.npmrc;
        echo "//${NPM_REGISTRY}/:_authToken=${NPM_TOKEN}" >> ~/.npmrc;
      }
      function restoreRc {
        echo "Restoring original .npmrc config file";
        if [ -f ~/.npmrc.bk ];
        then
          mv ~/.npmrc.bk ~/.npmrc;
        else
          rm -f ~/.npmrc;
        fi;
        cd $OLD_CWD;
        rm -rf /tmp/cacheprep;
      }
      trap restoreRc EXIT;
      installRc;
      GENCMD="${{parameters.gencmd}}";
      HASH=$(${{parameters.hashcmd}});
      HASH=${HASH::-3}
      echo "Calculated hash key - $HASH";
      BEARER=$(cat ~/.npmrc | grep -m 1 _authToken | sed -e 's/.*_authToken=//g');
      REGISTRY=$(cat ~/.npmrc | grep -m 1 _authToken | sed -e 's/:_authToken=.*//g');
      RESPONSE=$(curl -s -H "Authorization: Bearer $BEARER" https:${REGISTRY}${{parameters.name}});
      if [[ $RESPONSE != *"$HASH"* ]];
      then
        echo "Cache miss, running gencmd";
        restoreRc;
        eval "${{parameters.gencmd}}";
        installRc;
        rm -rf /tmp/cacheprep;
        mkdir -p /tmp/cacheprep;
        tar -czf /tmp/cacheprep/data.tar.gz -C ${{parameters.directory}} .;
        cd /tmp/cacheprep;
        echo "{\"name\":\"${{parameters.name}}\",\"version\":\"0.0.0-$HASH\"}" > package.json;
        cat package.json;
        npm -reg https:${REGISTRY} publish;
        exit 0;
      fi
      echo "Ensure target directory exists - ${{parameters.directory}}";
      if [ ! -d ${{parameters.directory}} ];
      then
        mkdir -p ${{parameters.directory}};
      fi;
      cd ${{parameters.directory}};
      echo "Downloading cached artifacts";
      curl -L -O -H "Authorization: Bearer $BEARER" https:${REGISTRY}${{parameters.name}}/-/${{parameters.name}}-0.0.0-$HASH.tgz;
      tar -xzf ${{parameters.name}}-0.0.0-${HASH}.tgz;
      rm -f ${{parameters.name}}-0.0.0-${HASH}.tgz;
      tar -xzf ./package/data.tar.gz;
      rm -rf ./package;
    env:
      NPM_REGISTRY: ${{parameters.npmRegistry}}
      NPM_TOKEN: ${{parameters.npmToken}}
    displayName: Pipeline cache (${{parameters.name}})

@mitchdenny
Copy link
Member

You might find that Universal Packages gives you better performance here (the protocol that Universal Packages uses will be what our build cache uses).

@brad-jones
Copy link

I did start out by trying to use Universal Packages but the performance was not amazing. Not necessarily due to the underlying protocols it uses but due to the fact you always have to install ArtifactTool.exe

If you follow the instructions from the Feed Connection page you need to have the vsts cli installed, which then goes and downloads the ArtifactTool.exe

If you use the Universal Packages "task" it also has to install the ArtifactTool.exe and I imagine the AzureCli is no different.

I guess if this additional executable was baked into the VM images it would speed things up a lot.

I tried looking for docs/specs on a HTTP API for Universal Packages but couldn't find anything useful.

So that's when I changed over to NPM. The registry spec is openly known and no additional tools need to be downloaded to use it.

To download a cached artifact the above pipeline step normally runs in sub second territory.

@mitchdenny
Copy link
Member

For Pipeline Caching we'll end up being baked into the agent, so nothing to download. I am interested in how much your depdencies are that they are coming down sub-second. How many files on disk, total volume on disk, size of archive/package?

@brad-jones
Copy link

This is the one that runs sub-second so admittedly it's not a massive payload to download at only 8MB but when I was using Universal Packages it was faster to just build this artifact (a golang binary) instead of cache it.

Anyway hopefully the built in cache mechanism being built at the moment will be nice and performant.

##[section]Starting: Pipeline cache (gomake-runner)
==============================================================================
Task         : Bash
Description  : Run a Bash script on macOS, Linux, or Windows
Version      : 3.142.2
Author       : Microsoft Corporation
Help         : [More Information](https://go.microsoft.com/fwlink/?LinkID=613738)
==============================================================================
Generating script.
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/e780d664-1c4f-4721-a651-05689a441f52.sh
Installing build cache .npmrc config file
Calculated hash key - e6fff909a5fff4530089174c95c183398180deae
Ensure target directory exists - /home/vsts/work/1/s/.gomake
Downloading cached artifacts
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 8335k  100 8335k    0     0  21.8M      0 --:--:-- --:--:-- --:--:-- 70.7M
Restoring original .npmrc config file
##[section]Finishing: Pipeline cache (gomake-runner)

@mitchdenny
Copy link
Member

Closing this issue as Azure Pipelines now has pipeline caching.

@lietu
Copy link

lietu commented Oct 2, 2019

@mitchdenny yet this says it's WIP? https://github.com/microsoft/azure-pipelines-yaml/blob/master/design/pipeline-caching.md

@mitchdenny
Copy link
Member

Yeah - that was a design note - the ** Work In Progress ** thing wasn't removed before it was merged. Here is where the feature is described right now:

https://devblogs.microsoft.com/devops/caching-and-faster-artifacts-in-azure-pipelines/

I believe there is still more work to do - but it is in a usable state right now.

@asottile
Copy link

asottile commented Oct 4, 2019

Is there a place to leave feedback? I have some... thoughts

@altendky
Copy link

altendky commented Oct 4, 2019

the ** Work In Progress ** thing wasn't removed before it was merged

https://github.com/apps/wip ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: ArtifactsPackages Azure Artifacts Packaging Team enhancement
Projects
None yet
Development

No branches or pull requests

9 participants