Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Persistent Task Invalidation #6

Merged
merged 7 commits into from
Nov 6, 2018

Conversation

berfarah
Copy link
Contributor

With this feature, tasks can remain valid between taskrunner executions if the task's dependent files have not changed.

An example use-case of this is yarn install, which we can safely assume shouldn't run again unless package.json/yarn.lock have changed.

This requires a git repository to be in place to work.

@berfarah berfarah requested a review from stephen October 31, 2018 22:15
@berfarah berfarah force-pushed the bf/persistent-invalidation-cache branch from a0e379f to 9ed212f Compare October 31, 2018 22:19
@berfarah berfarah requested a review from willhug October 31, 2018 22:19
@berfarah berfarah force-pushed the bf/persistent-invalidation-cache branch 9 times, most recently from 7827992 to 1966616 Compare November 1, 2018 17:10
@@ -83,7 +83,7 @@ func Run(tasks []*Task, options ...RunOption) {
}

if len(tasks) == 0 {
panic(fmt.Errorf("no tasks specified"))
log.Fatalln("No tasks specified")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistent lowercasing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for consistently upcasing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wish we didn't change it. either way, these aren't consistent now

cache/cache.go Outdated
}

func getCacheFilePath(dir string) string {
hashedName := strings.Replace(dir, "/", "%", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about replacing any non-alphanum with -?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regexp.MustCompile(`[^A-Za-z\-_]`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do that - I figured slashes would be the only dangerous thing though. NSO

We will need the output of these functions in order to get the
beginnings of our cache going, since our cache is based on doing a git
diff against a hash + manual hashing of uncommitted files.
@berfarah berfarah force-pushed the bf/persistent-invalidation-cache branch 2 times, most recently from 1fc4a09 to d672483 Compare November 5, 2018 18:57
cache/md5sum.go Outdated
"path/filepath"
)

// hashSum takes in a file or directory and hashes based on filename + modified time.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hashes every directory on the path as well right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorta - it does a hashsum of all the files within a directory OR a hash on a single filename

@berfarah berfarah force-pushed the bf/persistent-invalidation-cache branch from d672483 to 629e136 Compare November 5, 2018 19:22
@berfarah
Copy link
Contributor Author

berfarah commented Nov 5, 2018

@stephen @willhug moved taskrunner changes to #7

Copy link

@willhug willhug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look but some stuff may have changed under the hood as I was looking. Overall good!

return &snapshot, nil
}

func (s snapshot) hashFor(path string) string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer reference to snapshot?


// Only mark committed files as dirty (uncommitted files are handled below).
for _, f := range diffFiles {
if previous.hashFor(f) == "" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashForUncommitedFile?

// If the file isn't currently uncommitted, rehash.
md5 := current.hashFor(file.Path)
if md5 == "" {
md5, _ = c.HashFunc(file.Path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we print out this error? Or log it somewhere?

}

// write records the current state to a file.
func (c *snapshotter) Write(ctx context.Context, path string) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this to cacheFilePath

}
}

func (c *Cache) Start(ctx context.Context, opt shell.RunOption) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a list of options?


const CachePath = ".cache/taskrunner"

var CacheDir = path.Join(os.Getenv("HOME"), CachePath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want this cached globally? Or do we want one per repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic for that out - it will be one per repo

return false
}
for _, f := range c.dirtyFiles {
if taskrunner.IsTaskSource(task, f) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsTaskSource doesn't have a comment. Can we add one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -7,28 +7,30 @@ import (
"github.com/samsarahq/taskrunner/watcher"
)

func IsTaskSource(task *Task, path string) (matches bool) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment?

Copy link

@willhug willhug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was just removed commits, looks good!

This function works for both files and directories. We are choosing to
start with a hash off of filename and modified time for now, but can
make this include contents if we need to make the cache more resilient.
These snapshots are comparable across time and will determine
invalidated files, allowing Taskrunner to determine whether a task needs
to re-run on boot.

A diff against a previous commit is preferred. For uncommitted (and new)
files, the hash is compared to a hash in the previous snapshot. If there
was no hash before, the file is marked as changed.
Since we will be using this functionality to determine whether a task
should be invalidated between task runs, it makes sense to expose this
as public API.

Leaving this off of the Task struct for now, but could easily move this
to be a method on Task.
With this feature, tasks can remain valid between taskrunner executions
if the task's dependent files have not changed.

An example use-case of this is `yarn install`, which we can safely
assume shouldn't run again unless `package.json`/`yarn.lock` have
changed.
Many of our panics aren't that exceptional and should be normal errors
to a user. This should help reduce noise when looking at Taskrunner
errors.
@berfarah berfarah force-pushed the bf/persistent-invalidation-cache branch from 629e136 to 58dd6f1 Compare November 6, 2018 00:33
@berfarah berfarah merged commit 7fe220e into master Nov 6, 2018
@berfarah berfarah deleted the bf/persistent-invalidation-cache branch November 6, 2018 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants