Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support wildcard/glob file patterns in Pipeline Caching keys #10859

Closed
willsmythe opened this issue Jul 9, 2019 · 2 comments
Closed

Support wildcard/glob file patterns in Pipeline Caching keys #10859

willsmythe opened this issue Jul 9, 2019 · 2 comments

Comments

@willsmythe
Copy link
Contributor

Required Information

Question, Bug, or Feature?
Type: Feature

Enter Task Name: Cache

Environment

Hosted

Issue Description

Net-net: allow pipeline caching keys to be composed from the hash of files matching a pattern, for example **/yarn.lock or **/pom.xml.

Current behavior

The key for a pipeline cache can be composed from one or more strings or file paths. For each file path, the Cache task generates a hash from the contents of the file. This basically allows a cache to be tied to one or more specific files whose contents directly impact the contents of the cache. When any of these files change, a new cache key / cache is produced.

For simple projects (e.g. a simple Node.js app with one package-lock.json), configuring and maintaining the cache key is trivial (might just be a single line: key: package-lock.json), but for complex projects that need to base the key on multiple files, setting the initial cache key and, more importantly, ensuring the key stays up to date (as new files are added, removed, etc) is time-consuming and is likely something that gets forgotten down the road.

As an example, iluwatar/java-design-patterns has 100+ pom.xml files that should be part of the cache key, but it would be ridiculous to list them all and would be difficult to keep up to date. The only option right now is to write a script that produces a hash of all pom.xml files, sets this value as a variable, and then uses this varable in the cache key:

steps:
- bash: |
    export h=($(find -name pom.xml -exec md5sum '{}' + | sort -k 2 | md5sum))
    echo "##vso[task.setvariable variable=POM_FILES_HASH]$h"
  displayName: 'Calculate hash of all pom.xml files'

- task: Cache@0
  inputs:
    path: $(MAVEN_CACHE_FOLDER)
    key: $(POM_FILES_HASH)

Not great.

monorepo-style projects, including projects like facebook/Jest that use Yarn Workspaces have the same problem. Currently the Cache step for Jest would look something like this:

steps:
- task: Cache@0
  inputs:
    path: $(YARN_CACHE_FOLDER)
    key: |
      yarn
      $(Agent.OS)
      yarn.lock
      e2e/async-regenerator/yarn.lock
      e2e/babel-plugin-jest-hoist/yarn.lock
      e2e/chai-assertion-library-errors/yarn.lock
      e2e/console-winston/yarn.lock
      e2e/coverage-remapping/yarn.lock
      e2e/coverage-transform-instrumented/yarn.lock
      e2e/global-setup-node-modules/yarn.lock
      e2e/native-async-mock/yarn.lock
      e2e/pnp/yarn.lock
      e2e/stack-trace-source-maps/yarn.lock
      e2e/transform/babel-jest/yarn.lock
      e2e/transform/babel-jest-manual/yarn.lock
      e2e/transform/multiple-transformers/yarn.lock
      e2e/transform/transformer-config/yarn.lock
      e2e/typescript-coverage/yarn.lock

Again, not great.

Ideal behavior

Ideally developers are able to specify wildcard patterns (glob-style) that would match one or more files in the repo. This would simplify the initial configuration experience and avoid errors down the road when other users forget to update the cache when a file is added or deleted.

The solution might look like this:

steps:
- task: Cache@0
  inputs:
    path: $(YARN_CACHE_FOLDER)
    key: |
      yarn
      $(Agent.OS)
      **/yarn.lock

This sorta drives the need for a more precise way to indicate that a key part is a file or file pattern versus a regular string, so something like this would be better:

steps:
- task: Cache@0
  inputs:
    path: $(YARN_CACHE_FOLDER)
    key: |
      yarn
      $(Agent.OS)
      $[hash('**/yarn.lock')]

This example uses runtime expression syntax $[func(..)], which is currently only used for setting a variable to a counter.

Task logs

NA

Troubleshooting

NA

Error logs

NA

johnterickson added a commit to johnterickson/azure-pipelines-agent that referenced this issue Jul 24, 2019
TingluoHuang pushed a commit to microsoft/azure-pipelines-agent that referenced this issue Jul 25, 2019
@fadnavistanmay
Copy link
Contributor

This is released. Closing the issue.

@sten-rosendahl
Copy link

The cache task explicitly forbids multi-segment key wildcards, even though shells like bash or even the ls tool in Git for Windows allows it. Is this by accident or by design? Should I raise a new issue for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants