Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finer-grained InMemoryCache result caching. #5617

Merged
merged 4 commits into from
Nov 25, 2019

Conversation

benjamn
Copy link
Member

@benjamn benjamn commented Nov 25, 2019

I encourage reading all the commit messages included here, but the final commit is the capstone:

Use ID+field dependencies for result caching, rather than just ID

Until now, the InMemoryCache result caching system (introduced by #3394) registered dependencies on specific entity objects by ID, which meant changes to any of the fields of an entity in the cache would invalidate every query result that previously involved that entity, even if not all of those queries actually used any of the fields that changed.

While this system was logically correct, it resulted in unnecessary recomputation of cached query results. Perhaps most dramatically, any changes to the fields of the ROOT_QUERY object would necessitate the recomputation of every query that involved any root query fields, because those queries depended on the ROOT_QUERY ID rather than the specific fields they consumed.

With this commit, the StoreReader reads from the EntityCache using the new method getFieldValue(dataId: string, fieldName: string): StoreValue, which registers dependencies on the combination of the entity ID and the name of the requested field, so cached query results will remain valid as long as the specific fields they used have not changed.

Furthermore, the EntityCache#set method has been replaced by a method called merge, which intelligently updates the normalized data to incorporate a set of new fields, automatically invalidating dependencies for any fields whose values are modified by the merge.

For backwards compatibility, if a consumer chooses to continue using the EntityCache#get(id) method instead of getFieldValue(id, fieldName), ID-based dependencies will be registered (and later invalidated), exactly as before.

Although functional decomposition is generally a good idea, it is neither
required nor beneficial in the case of StoreReader#executeField, since we
do not need to cache the value of every individual field (as we do for
executeSelectionSet and executeSubSelectedArray), and executeField was a
private method with only one caller (executeSelectionSet), so we can
inline it without duplicating its implementation in multiple places.

By moving the executeField implementation into executeSelectionSet, we can
avoid generating an ExecResult object for every field value that we read
from the cache, and we have direct access to the StoreObject whose fields
we are reading, which will prove convenient in upcoming commits.

Given how often executeField was called, even a tiny reduction in the
overhead of calling it as a separate method (and handling its results)
adds up in a big way.
…entObject.

Exposing the entire parent StoreObject to custom read and merge functions
was risky, even if we made it read-only, because it permitted unobserved
access to sibling fields of the parentObject, which would have prevented
effective result caching for read functions that compute their results
using sibling fields as input.

Passing in a function that takes a field name and returns the value of
that field (namely, getFieldValue) enables the same sibling field use
cases, without exposing unrelated StoreObject fields. In the future, we
can use this function to register dependencies on the fields that were
requested, which would not have been possible with just an object.
Until now, the result caching system registered dependencies on specific
entity objects by ID, which meant that changes to any of the fields of an
entity in the cache would invalidate every query result that previously
involved that entity, even if some of those queries did not use any of the
fields that changed.

While this system was logically correct, it resulted in extensive and
unnecessary recomputation of cached query results. Perhaps most
dramatically, any changes to the fields of the ROOT_QUERY object would
necessitate the recomputation of every query that involved any root query
fields, because those queries depended on the ROOT_QUERY ID rather than
the specific fields they consumed.

With this commit, the StoreReader reads from the EntityCache using a new
method called getFieldValue, which registers dependencies on the
combination of the entity ID and the name of the requested field, so that
cached query results will remain valid as long as the specific fields they
used have not changed.

Furthermore, the EntityCache#set method has been replaced by a method
called merge, which intelligently updates the normalized data using a set
of new fields, automatically invalidating dependencies for any fields
whose values are modified by the merge.

For backwards compatibility, if a consumer chooses to continue using the
EntityCache#get(id) method instead of getFieldValue(id, fieldName),
ID-based dependencies will be registered, as before.
Copy link
Member

@hwillson hwillson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome @benjamn! Great idea with getFieldValue - that definitely seems safer.

@benjamn benjamn merged commit 109b38b into release-3.0 Nov 25, 2019
benjamn added a commit that referenced this pull request Dec 2, 2019
In PR #5617, we shifted from using just IDs for caching results read from
the EntityStore to using IDs plus field names, which made the result
caching system significantly more precise and less likely to trigger
unnecessary rerenders.

However, the field names we used in that PR were storeFieldName strings,
instead of the actual field names you'd find in your GraphQL schema.
Within the InMemoryCache implementation, a storeFieldName is the full
string produced by Policies#getStoreFieldName, which must begin with the
original field.name.value, but may include an additional suffix that
captures field identity based on arguments (and also, but less frequently,
directives).

For the purposes of result caching, it's important to depend on the entity
ID plus the actual field name (field.name.value), rather than the ID plus
the storeFieldName, because there can be many different storeFieldName
strings for a single field.name.value (and we don't want to waste memory
on dependency tracking), and because it's safer (more conservative) to
dirty all fields with a particular field.name.value when any field with a
matching storeFieldName is updated or evicted, since field values with the
same field.name.value often overlap in subtle ways (for example, multiple
pages of a paginated list), so it seems dangerous to encourage deleting
some but not all of them.

Perhaps more importantly, application developers should never have to
worry about the concept of a storeFieldName, since it is very much an
implementation detail of the InMemoryCache, and the precise format of
storeFieldName strings is subject to change.
benjamn added a commit that referenced this pull request Dec 2, 2019
The previousResult option was originally a way to ensure referential
identity of structurally equivalent cache results, before the result
caching system was introduced in #3394. It worked by returning
previousResult whenever it was deeply equal to the new result.

The result caching system works a bit differently, and in particular never
needs to do a deep comparison of results. However, there were still a few
(test) cases where previousResult seemed to have a positive effect, and
removing it seemed like a breaking change, so we kept it around.

In the meantime, the equality check has continued to waste CPU cycles, and
the behavior of previousResult has undermined other improvements, such as
freezing cache results (#4514). Even worse, previousResult effectively
disabled an optimization that allowed InMemoryCache#broadcastWatches to
skip unchanged queries (see comments I removed if curious). This commit
restores that optimization.

I realized eliminating previousResult might finally be possible while
working on PR #5617, which made the result caching system more precise by
depending on IDs+fields rather than just IDs. This additional precision
seems to have eliminated the few remaining cases where previousResult had
any meaningful benefit, as evidenced by the lack of any test changes in
this commit... even among the many direct tests of previousResult in
__tests__/diffAgainstStore.ts!

The removal of previousResult is definitely a breaking change (appropriate
for Apollo Client 3.0), because you can still contrive cases where some
never-before-seen previousResult object just happens to be deeply equal to
the new result. Also, it's fair to say that this removal will strongly
discourage disabling the result caching system (which is still possible
for diagnostic purposes), since we rely on result caching to get the
benefits that previousResult provided.
@benjamn benjamn deleted the finer-grained-result-caching branch December 2, 2019 23:43
benjamn added a commit that referenced this pull request Dec 3, 2019
In PR #5617, we shifted from using just IDs for caching results read from
the EntityStore to using IDs plus field names, which made the result
caching system significantly more precise and less likely to trigger
unnecessary rerenders.

However, the field names we used in that PR were storeFieldName strings,
instead of the actual field names you'd find in your GraphQL schema.
Within the InMemoryCache implementation, a storeFieldName is the full
string produced by Policies#getStoreFieldName, which must begin with the
original field.name.value, but may include an additional suffix that
captures field identity based on arguments (and also, but less frequently,
directives).

For the purposes of result caching, it's important to depend on the entity
ID plus the actual field name (field.name.value), rather than the ID plus
the storeFieldName, because there can be many different storeFieldName
strings for a single field.name.value (and we don't want to waste memory
on dependency tracking), and because it's safer (more conservative) to
dirty all fields with a particular field.name.value when any field with a
matching storeFieldName is updated or evicted, since field values with the
same field.name.value often overlap in subtle ways (for example, multiple
pages of a paginated list), so it seems dangerous to encourage deleting
some but not all of them.

Perhaps more importantly, application developers should never have to
worry about the concept of a storeFieldName, since it is very much an
implementation detail of the InMemoryCache, and the precise format of
storeFieldName strings is subject to change.
benjamn added a commit that referenced this pull request Dec 3, 2019
The previousResult option was originally a way to ensure referential
identity of structurally equivalent cache results, before the result
caching system was introduced in #3394. It worked by returning
previousResult whenever it was deeply equal to the new result.

The result caching system works a bit differently, and in particular never
needs to do a deep comparison of results. However, there were still a few
(test) cases where previousResult seemed to have a positive effect, and
removing it seemed like a breaking change, so we kept it around.

In the meantime, the equality check has continued to waste CPU cycles, and
the behavior of previousResult has undermined other improvements, such as
freezing cache results (#4514). Even worse, previousResult effectively
disabled an optimization that allowed InMemoryCache#broadcastWatches to
skip unchanged queries (see comments I removed if curious). This commit
restores that optimization.

I realized eliminating previousResult might finally be possible while
working on PR #5617, which made the result caching system more precise by
depending on IDs+fields rather than just IDs. This additional precision
seems to have eliminated the few remaining cases where previousResult had
any meaningful benefit, as evidenced by the lack of any test changes in
this commit... even among the many direct tests of previousResult in
__tests__/diffAgainstStore.ts!

The removal of previousResult is definitely a breaking change (appropriate
for Apollo Client 3.0), because you can still contrive cases where some
never-before-seen previousResult object just happens to be deeply equal to
the new result. Also, it's fair to say that this removal will strongly
discourage disabling the result caching system (which is still possible
for diagnostic purposes), since we rely on result caching to get the
benefits that previousResult provided.
benjamn added a commit that referenced this pull request Dec 3, 2019
The previousResult option was originally a way to ensure referential
identity of structurally equivalent cache results, before the result
caching system was introduced in #3394. It worked by returning
previousResult whenever it was deeply equal to the new result.

The result caching system works a bit differently, and in particular never
needs to do a deep comparison of results. However, there were still a few
(test) cases where previousResult seemed to have a positive effect, and
removing it seemed like a breaking change, so we kept it around.

In the meantime, the equality check has continued to waste CPU cycles, and
the behavior of previousResult has undermined other improvements, such as
freezing cache results (#4514). Even worse, previousResult effectively
disabled an optimization that allowed InMemoryCache#broadcastWatches to
skip unchanged queries (see comments I removed if curious). This commit
restores that optimization.

I realized eliminating previousResult might finally be possible while
working on PR #5617, which made the result caching system more precise by
depending on IDs+fields rather than just IDs. This additional precision
seems to have eliminated the few remaining cases where previousResult had
any meaningful benefit, as evidenced by the lack of any test changes in
this commit... even among the many direct tests of previousResult in
src/cache/inmemory/__tests__/diffAgainstStore.ts!

The removal of previousResult is definitely a breaking change (appropriate
for Apollo Client 3.0), because you can still contrive cases where some
never-before-seen previousResult object just happens to be deeply equal to
the new result. Also, it's fair to say that this removal will strongly
discourage disabling the result caching system (which is still possible
for diagnostic purposes), since we rely on result caching to get the
benefits that previousResult provided.
benjamn added a commit that referenced this pull request Dec 3, 2019
The getFieldValue(fieldName) helper function was introduced in #5617 for
reading fields from the current StoreObject during read functions.

This commit adds a second parameter to getFieldValue, foreignRef, which is
an optional Reference. When foreignRef is provided, getFieldValue will
read the specified field from the StoreObject identified by the
foreignRef, instead of reading from the current StoreObject.

In either case, getFieldValue reads an existing value from the cache,
without invoking any read functions, so you cannot use getFieldValue to
set up expensive (and potentially cyclic) chains of read functions.

With this new ability to read fields from arbitrary Reference objects,
read functions can explore the entire reachable cache, without having to
call cache.readQuery. The beauty of this system is that every field read
operation requires a function call (getFieldValue), which allows the
result caching system to know which fields were read from which entities,
so future changes to those fields can properly invalidate any cached
results that involved the original read function.
benjamn added a commit that referenced this pull request Dec 4, 2019
The getFieldValue(fieldName) helper function was introduced in #5617 for
reading fields from the current StoreObject during read functions.

This commit adds a second parameter to getFieldValue, foreignRef, which is
an optional Reference. When foreignRef is provided, getFieldValue will
read the specified field from the StoreObject identified by the
foreignRef, instead of reading from the current StoreObject.

In either case, getFieldValue reads an existing value from the cache,
without invoking any read functions, so you cannot use getFieldValue to
set up expensive (and potentially cyclic) chains of read functions.

With this new ability to read fields from arbitrary Reference objects,
read functions can explore the entire reachable cache, without having to
call cache.readQuery. The beauty of this system is that every field read
operation requires a function call (getFieldValue), which allows the
result caching system to know which fields were read from which entities,
so future changes to those fields can properly invalidate any cached
results that involved the original read function.
benjamn added a commit that referenced this pull request Jan 23, 2020
One of the most important internal changes in Apollo Client 3.0 is that
the result caching system now tracks its dependencies at the field level,
rather than at the level of whole entity objects: #5617

As proof that this transformation is complete, we can finally remove the
ability to read entire entity objects from the EntityStore by ID, so that
the only way to read from the store is by providing both an ID and the
name of a field. The logic I improved in #5772 is now even simpler,
without the need for overloading multiple EntityStore#get signatures.

In the process, I noticed that the EntityStore#has(ID) method was
accidentally depending on any changes to the contents of the identified
entity, even though store.has only cares about the existence of the ID in
the store. This was only a problem if we called store.has during a larger
read operation, which currently only happens when InMemoryCache#read is
called with options.rootId. The symptoms of this oversight went unnoticed
because the caching of readFragment results was just a little more
sensitive to changes to the given entity. The results themselves were
still logically correct, but their caching was not as effective as it
could be. That's the beauty and the curse of caching: you might not notice
when it isn't working, because all your tests still pass, and nobody
complains that their application is broken.

Fortunately, we can model this dependency with an ID plus a fake field
name called "__exists", which is only dirtied when the ID is newly added
to or removed from the store.
benjamn added a commit that referenced this pull request Jun 8, 2020
When an object is evicted from the cache, common intuition says that any
dangling references to that object should be proactively removed from
elsewhere in the cache. Thankfully, this intuition is misguided, because a
much simpler and more efficient approach to handling dangling references
is already possible, without requiring any new cache features.

As the tests added in this commit demonstrate, the cleanup of dangling
references can be postponed until the next time the affected fields are
read from the cache, simply by defining a custom read function that
performs any necessary cleanup, in whatever way makes sense for the logic
of the particular field. This lazy approach is vastly more efficient than
scanning the entire cache for dangling references would be, because it
kicks in only for fields you actually care about, the next time you ask
for their values.

For example, you might have a list of references that should be filtered
to exclude the dangling ones, or you might want the dangling references to
be nullified in place (without filtering), or you might have a single
reference that should default to something else if it becomes invalid. All
of these options are matters of application-level logic, so the cache
cannot choose the right default strategy in all cases.

By default, references are left untouched unless you define custom logic
to do something else. It may actually be unwise/destructive to remove
dangling references from the cache, because the evicted data could always
be written back into the cache at some later time, restoring the validity
of the references. Since eviction is not necessarily final, dangling
references represent useful information that should be preserved by
default after eviction, but filtered out just in time to keep them from
causing problems. Even if you ultimately decide to prune the dangling
references, proactively finding and removing them is way more work than
letting a read function handle them on-demand.

This system works because the result caching system (#3394, #5617) tracks
hierarchical field dependencies in a way that causes read functions to be
reinvoked any time the field in question is affected by updates to the
cache, even if the changes are nested many layers deep within the
field. It also helps that custom read functions are consistently invoked
for a given field any time that field is read from the cache, so you don't
have to worry about dangling references leaking out by other means.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants