Selective cache invalidation #700

smashwilson · 2017-04-24T15:50:55Z

Selectively invalidate the cached Repository state when (a) performing write operations ourselves and (b) receiving filesystem events. The goal here is to reduce the number of times that we need to shell out to git at all to avoid the performance penalty of launching a new process.

Fixes #671. Related to #201 and #627.

Approach

The setup for the actual caching of operations was done back in #654, where both the Cache object and @invalidate() decoration were introduced. This pull request expands on that foundation by:

Adding an argument to @invalidate() that's a function that's expected to return an array of cache keys that should be evicted after the Promise returned by that function completes (successfully or unsuccessfully).
Reworking the Cache to track "groups" of keyed items for easy eviction of subsets of the cache without needing an O(n) key scan.
Pass filesystem events to the Repository rather than just a "hey something changed somewhere" event. Used the modified files to invalidate parts of the cache.
Writing a shit-ton of tests to guard against any operations caching data that we shouldn't. More on that later.

Along the way:

I tinkered with getCurrentBranch(), because the filesystem event tests revealed a race condition. If an external process changes HEAD between the file content check and the git symbolic-ref call, it would fail with an error.
I moved .isPartiallyStaged() from Present to Repository and rewrote it in terms of .getStatusesForChangedFiles(), so that it would take advantage of cached status results. Also, it was annoying to test for with the test harness.

Cache implementation

All of the actual Cache implementation lives in lib/models/repository-states/present.js. It uses a pair of Maps: one that maps a unique string to the Promise generated the last time an operation was invoked, and one that maps a group name to a set of CacheKey objects that belong to that group.

Externally, callers populate the cache by calling .getOrSet() with a CacheKey. Each CacheKey contains a primary (globally-unique) key, often generated from a filename, and zero or more groups to which that key can belong. The primary key is used to do the lookup for an existing result; if none is present, the actual operation proceeds.

To evict items from the cache, @invalidate() passes it a collection of either CacheKey or GroupKey instances. A CacheKey evicts an item that matches its primary key exactly; a GroupKey evicts all keys that belong to the group it names.

To avoid leaving "magic strings" around (and having things fall out of sync), cache keys and groups can be generated by functions and constants in the Keys object. Key functions are either (a) a single constant (Keys.changedFiles, Keys.filePatch.all), (b) a function that generates a single key based on some parameter (naming convention: ".oneWith()", Keys.index.oneWith(fileName)), (c) a function that generates many keys (naming convention: ".eachWithXyz()", Keys.filePatch.eachWithOpts(...)), or (d) a function that generates an array of related keys that are commonly evicted together (Keys.headOperationKeys()).

Test harness

The highest risk is not evicting a cache key when the underlying operation has changed, while what we want to optimize is evicting the smallest set of cache keys that could change as the result of a specific operation. To make it easier to capture these, I wrote the assertCorrectInvalidation harness, which:

Uses a Map that gives friendly names to example calls of every Repository accessor that contains a this.cache.getOrSet() call.
Executes every call in the map once and remembers the Promises returned by each (before).
Executes the code under test. Ideally, this executes a single repository method with arguments and pre-existing state set up to result in as many changed Repository accessors as possible.
Executes every accessor in the map again and stores the Promises returned this time. If the call was not evicted from the cache by the operation, the Promise returned will be === to the ones collected before. (cached)
Clears the Repository's cache.
Executes the accessors one more time to determine what accessors actually changed output during the operation. (after)

To perform the actual assertion, the harness computes (a) after and before to generate the set of accessors that were actually modified by the operation and (b) cached and before to generate the set of accessors that were evicted from the cache.

If an accessor should have been evicted, but was not, the harness throws an error and fails the testcase; that's the bad case. If it should not have been evicted but was, that's okay, it just means that we might have been able to keep something in the cache, so it only causes a failure if {strict: true} is specified. If {verbose: true} is specified, both accessor sets are dumped to the console to make it easier to see what's going on and fine-tune the eviction.

I have test cases in this PR that cover every Repository action method that is marked with the @invalidate() decorator, and for the filesystem events generated by the underlying git operations that they use.

Left to do

Add arguments to the @invalidate() decorator
Invalidate state based on filesystem events
Use cached status results for isPartiallyStaged()
More efficient wildcard eviction from the cache
Rename Keys. properties to clarify at a glance which are constants, which are functions that generate single keys, which are functions that generate multiple keys...
Fill in a lot of tests
- Pending tests for writing a merge conflict to the index.
Docs and writeup

smashwilson · 2017-04-24T15:54:31Z

For reference, here's a waterfall I collected before starting this work:

waterfall data

smashwilson · 2017-04-24T20:20:57Z

Here's how it looks with some fairly aggressive caching in place:

waterfall data

I still have a bunch left to do -- mostly related to putting tests in place that ensure we're invalidating and not invalidating the correct state for each operation. But I think that's a promising start.

…alidation

smashwilson · 2017-04-27T19:56:02Z

Haha, yep. The render from the filesystem event does trample the one from the autorefresh, but both finish right around the 100ms mark:

waterfall data

I might actually implement the "expected filesystem update" feature I talked about with @BinaryMuse back when we were first talking about autorefresh. In another PR, though, this is already under the 100ms mark 😁

smashwilson · 2017-04-28T13:39:29Z

On reflection: expected updates might be more important to this than I'd thought, because git status touches the index, which creates an index filesystem event which invalidates the status cache... you can see the uncached git status call in the second waterfall. 🤔

…alidation

smashwilson added 3 commits April 24, 2017 11:04

Move Cache after the actual Present state

d4dee19

Use Keys constants and functions as cache keys

9998834

Pending tests for using and invalidating the Repository cache

9d23779

smashwilson added git performance work-in-progress labels Apr 24, 2017

smashwilson added 2 commits April 24, 2017 14:18

Test cache invalidation for a file staging event

71b259b

Populate the cache invalidation decorations

29243d1

smashwilson added 20 commits April 24, 2017 16:27

Invalidate correct keys for FilePatch operations

7123b1e

Accept fs events within .git/refs/heads

ead66fa

Use didChange() methods

b849ec6

observeFilesystemChange accepts an argument

5b56a7a

First pass at cache invalidation from filesystem events

6e8f493

Fixture repository with both multiple files and commits

eeda89c

⚙️ Automated-ish cache invalidation test framework

bc1dc43

Use non-arrow functions for helpers

da930a2

Capture errors from Repository options

34f233b

Allow expected options to not change when uncached

77230f9

Staging, unstaging, commit and merge operations

e3a4d82

Tune invalidated cache keys

003a05e

Use identity comparison of Promises to locate cache invalidations

39946c0

Use cached status results for isPartiallyStaged

d3f0940

🔥 obsolete verbose caching tests

9248fa6

Don't check isPartiallyStaged for cache ops

941591e

Accept a "files" array

dc833f6

Allow skipped operations

d77e2fa

Back to green

213e169

Fetching, pushing, pulling

2c32666

smashwilson added 8 commits April 27, 2017 11:38

👕 no shadowing variables

3b8b2a5

🔥 unused default @invalidate() arg

c89880e

Evict groups of cache keys more efficiently

f49cee0

Bweep bweep, typo alert

9286631

Filesystem events trigger observeFilesystemChange, not refresh

8900307

Merge branch 'master' of github.com:atom/github into aw-selective-inv…

6cdef7d

…alidation

Let's see what's happening to those events on Windows

6ddee89

Fire eventCallback() even if all events are received before test

6747f43

smashwilson requested a review from kuychaco April 27, 2017 19:43

🔥 console

87a06e3

smashwilson added 2 commits April 27, 2017 15:57

I have a bajillion commits but it's only because I can't type

17c697f

wake up Travis

0aa21db

BinaryMuse mentioned this pull request Apr 27, 2017

Staging and unstaging files feels unresponsive #201

Open

Merge branch 'master' of github.com:atom/github into aw-selective-inv…

aa4a3b6

…alidation

smashwilson removed the work-in-progress label May 1, 2017

smashwilson added 8 commits May 2, 2017 15:38

Restore missing "await"

c7dbc40

Only update if there's an actual change

effc03d

🔥 expected lists and pending tests

dc37811

Merge branch 'master' into aw-selective-invalidation

1f10db8

Account for repository destruction

56db6fd

Update Keys invalidated for writeMergeConflictToIndex

45a7572

Cache invalidation test for writeMergeConflictToIndex()

a5e9503

Merge branch 'master' of github.com:atom/github into aw-selective-inv…

d09bf26

…alidation

smashwilson mentioned this pull request May 3, 2017

Discard anticipated filesystem events #727

Open

smashwilson merged commit 149bb27 into master May 3, 2017

smashwilson deleted the aw-selective-invalidation branch May 3, 2017 12:49

smashwilson mentioned this pull request May 3, 2017

Defer updates based on filesystem events when Atom is in the background #726

Merged

rebornix mentioned this pull request Dec 3, 2018

Implement certain git status operations without spawning git microsoft/vscode#64234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selective cache invalidation #700

Selective cache invalidation #700

smashwilson commented Apr 24, 2017 •

edited

Loading

smashwilson commented Apr 24, 2017

smashwilson commented Apr 24, 2017

smashwilson commented Apr 27, 2017

smashwilson commented Apr 28, 2017

Selective cache invalidation #700

Selective cache invalidation #700

Conversation

smashwilson commented Apr 24, 2017 • edited Loading

Approach

Cache implementation

Test harness

Left to do

smashwilson commented Apr 24, 2017

smashwilson commented Apr 24, 2017

smashwilson commented Apr 27, 2017

smashwilson commented Apr 28, 2017

smashwilson commented Apr 24, 2017 •

edited

Loading