Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fallback keys to caching #2340

Closed

Conversation

johnterickson
Copy link
Contributor

@johnterickson johnterickson commented Jul 5, 2019

@johnterickson johnterickson force-pushed the users/jerick/fallback branch from ac105f7 to cb26a38 Compare July 11, 2019 18:12
@johnterickson johnterickson marked this pull request as ready for review July 11, 2019 18:14
context.Verbose($"This fingerprint: `{fingerprint.ToString()}`");

if(fingerprint == result.Fingerprint)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we're only going output inexact on wildcard related fingerprints and not exclusively fallback keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A primary key can't be a wildcard because that's the name it saves as. Did that answer your question?

@johnterickson johnterickson force-pushed the users/jerick/fallback branch from 66f9250 to 6e474d4 Compare July 11, 2019 21:53
@willsmythe
Copy link
Contributor

@johnterickson / @owenhuynMSFT / @fadnavistanmay - because of the change from multi-line keys to single line (and pipe-delimited) keys, deploying these changes (as-is) would break existing pipelines using the CacheBeta@0 task, correct? If so ...

I think we should go ahead and introduce new versions of the restore and save plugins and essentially freeze the V0 plugins.

We need to introduce a new major version of the task (.json) anyway because of the change from a multi-line key input to a single line, so it seems logical for this "new" task to reference the new plugin version (and have aligned major versions) ...

.,. and yah, I know we discussed introducing a completely new task with a new GUID (Cache@0), but to keep the task's version aligned with its plugin's major version, could we just rev the major version of the existing task (so Cache@1) -- note the new name --- and have it reference the rev/new major versions of the plugin (i.e. Agent.Plugins.PipelineCache.SavePipelineCacheV1)?

Cache@1 would remain in preview for now, and might eventually become "released" (or we would introduce a Cache@2 and release it if there are any breaking changes between V1 and V2).

src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved
src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved
src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved
src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved
src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved

string absoluteRootRule = MakePathAbsolute(workingDirectory, rootRule);
context.Verbose($"Expanded include rule is `{absoluteRootRule}`.");
IEnumerable<string> absoluteExcludeRules = pathRules.Skip(1).Select(r => MakePathAbsolute(workingDirectory, r.Substring(1)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the r.Substring(1)? Are we expecting everything but the first "path rule" in a key segment to be "not" (exclude)? If so, something like yarn.lock, */yarn.lock, !node_modules/** wouldn't work ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about whether we should be stricter (for now) and not allow absolute file paths in "pathy" key segments that use wildcards, but still allow them in non-wildcard segments. This reduces the surface area of where we have to look for matching files. It also lets the user specify multiple "inclusions" rules in the same key segment, which can be followed by zero or more "exclusions" rules. Every rule would be rooted at $(System.DefaultWorkingDirectory).

The enumeration path would always be at least $(System.DefaultWorkingDirectory), but if all the inclusion rules were, for example, under a different path (e.g. packages/*.lock, packages/package.json), the enumeration path could be just $(System.DefaultWorkingDirectory)/packages.

Copy link
Contributor Author

@johnterickson johnterickson Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha! I had cut that because I though you didn't want it!

We could allow multiple includes and multiple excludes within a given key segment, but it could get ugly :)

Right now what is support is exactly one [include rule] followed by zero to many [exclude rule].


context.Verbose($"Enumerating starting at root `{enumerateRootPath}` with pattern `{enumeratePattern}`.");
IEnumerable<string> files = Directory.EnumerateFiles(enumerateRootPath, enumeratePattern, enumerateDepth);
files = files.Where(f => filter(f)).Distinct();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we discussed, I think there's the potential for a lot of perf problems here. A segment like **/yarn.lock, !node_modules will end up iterating thousands of times over files that will never match. Maybe we'll just need to build in more optimizations for scenarios like this over time.

Might be nice to show (in telemetry and logs, but maybe just in "debug mode") the # of files enumerated, # of files hashed, and total time. In extreme cases (e.g. 35 seconds to calculate the key), we output a warning in the log, otherwise the user will just think upload/download time is slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that **/yarn.lock will be expensive query whether we say dir /s *' or dir /s yarn.lock` because the file system has to enumerate all the directories either way, but yes - definite room for improvement.

src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved
src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved
src/Agent.Plugins/PipelineCache/FingerprintCreator.cs Outdated Show resolved Hide resolved
@johnterickson
Copy link
Contributor Author

@johnterickson johnterickson force-pushed the users/jerick/fallback branch from 4121da4 to 44de426 Compare July 19, 2019 23:11
@johnterickson
Copy link
Contributor Author

superceded by #2358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants