-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for cache tag invalidation #1169
Conversation
@martijnwalraven I've implemented basic support for Memcached/Redis for the KeyValueCache interface, which are currently just simple wrappers over other clients. Supports get/set with ttl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start! Really happy with the code quality and attention to testing. It seems we should be able to get a basic version in sooner than expected!
"jest": "^23.1.0", | ||
"jest-each": "^23.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about jest-each
, but that's a neat solution! it seems support for this has (very) recently become part of core Jest though, so maybe we don't even need the dependency (see README).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch!
@@ -0,0 +1,6 @@ | |||
export const servers = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to deal with external dependencies means these are not really unit tests and we probably don't want to run them as part of CI for example. I think they are great as integration tests, so I'd like to keep them (especially if we add cache tags on top of the basic features), but I'm not sure how we should organize them to separate them from the unit tests and what the best practices around integration testing are (@jbaxleyiii may have some ideas there).
A script to start a Memcached and Redis server ready for these tests would be useful I think. And maybe we could run it as part of a npm run test:integration
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a lot less familiar with Redis (though I suspect it should be similarly easy) but Memcached should be incredibly easy to mock. In fact, a quick search turned up memcached-mock
and redis-mock
.
If it's at all feasible to use those, rather than implementing a script to start one (which would need to struggle with cross-platform network/process implementation details, not to mention installation), that seems preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed! these integration tests probably don't need to be run with CI, its just for me to validate that the dependencies on external libraries work as expected. I'm running the databases with docker, will try to set it up as a npm script!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abernix that's awesome! I will use proxyquire to mock memcached
and node-redis
for testing (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: turns out I cant get proxyquire to work with typescript. Using jest.mock instead and it works great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better :)
it('is able to expire keys based on ttl', async () => { | ||
await keyValueCache.set('short', 's', { ttl: 1 }); | ||
await keyValueCache.set('long', 'l', { ttl: 5 }); | ||
await delay(1500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to depend on actual time passing (as opposed to mocking timers) is another reason this isn't really suitable as a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it acceptable to leave out tests for key expiry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarencenpy This may blast the complexity of the tests, but I can recommend lolex for doing time stuff. You can see an example here: https://github.com/n1ru4l/react-time-ago/blob/29006c7a2080626ec4206560fe4ecf69c3ca079b/src/index.test.js#L52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure! I will test it out with the mocked databases and it will be great if that plays well together! Thanks 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using @martijnwalraven 's mockDate()
(for calls to global date) and jest.useFakeTimers()
(for calls to setTimeout). Seems to work great with the mock libs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Yeah, as long as we're using mocks for the databases we don't need actual delays.
private client; | ||
private defaultSetOptions = { | ||
ttl: 300, | ||
tags: [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if default tags make sense. But a default ttl seems really useful, so maybe we could even expose that as a public property / configuration option.
|
||
close(): Promise<void> { | ||
this.client.end(); | ||
return Promise.resolve(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make the function async
I think you may not need the explicit Promise.resolve()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Assuming client.end()
doesn't take a callback like the Redis equivalent.)
edbb2c8
to
395da0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff, just a few comments about async function and one for the test and then we can merge!
} | ||
} | ||
|
||
async flush(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With an async function, thrown errors are automatically wrapped in a promise rejection and returned values are wrapped in a promise resolution, so there's no need to wrap the this.client.flush
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! This simplifies the code significantly!
|
||
it('is able to expire keys based on ttl', async () => { | ||
await keyValueCache.set('short', 's', { ttl: 1 }); | ||
await keyValueCache.set('long', 'l', { ttl: 5 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we can read these values right after they have been set and then ensure that both are invalidated after advancing the time past 5 seconds
88c3a8b
to
408edba
Compare
t: tags, | ||
}; | ||
|
||
await this.client.set(key, JSON.stringify(payload), ttl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of serializing tag metadata together with the key, we could also store metadata in separate namespaced keys:
set('key1', 'value1', ['tag1', 'tag2'])
gets stored like this:
key1: value1
meta-v-key1: <version number>
meta-t-key1: 'tag1||tag2'
// augment data with tags and version | ||
const version = ++this.logicalClock; | ||
const payload = { | ||
v: version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this wasn't really what I had in mind, and I don't think this will work as is. If we keep a logical clock in memory, that will be per process. So it seems there is no way to keep this working correctly across multiple server instances talking to the same store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that we would store <tag>: <version>
pairs in cache entries, and also keep separate <tag>: <version>
entries. We could then use store primitives to safely increment the version for a tag entry without the need for a global clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that it would be easy to move the logical clock into the key-value store as an entry of its own, once we think about multiple server processes. It seems to me like an easier thing to keep track of? Refer to this commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this boils down to a tradeoff between managing versions for every tag (more complex code, and reading all tag versions for each set
operation), vs managing a global version number (will we pay in performance?). Would be happy to learn about other alternatives that I am missing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, interesting! I'm worried keeping a global version instead of a version per tag will become a bottleneck and/or source of concurrency issues, but other people may have a better idea of the impact. I suspect that will depend on the characteristics of the store.
On Memcache, one thing to take into consideration is that the global version key will only be stored on a single node, so that will receive a lot of requests. And if it goes down, you basically lose the ability to validate all data (although the same is true for a subset of the data for per-tag version keys).
types/fetch/index.d.ts
Outdated
@@ -0,0 +1,111 @@ | |||
declare function fetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need declarations for fetch
and related types here, we'll get that from apollo-server-env
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
types/url/index.d.ts
Outdated
@@ -0,0 +1,41 @@ | |||
declare class URL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this file shouldn't be needed either.
const version = ++this.logicalClock; | ||
const operations: any[] = []; | ||
for (const tag of tags) { | ||
// what should be a good ttl to set here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how Memcache/Redis deal with this, but ideally we wouldn't want tags to ever expire at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting it to 0 seems to do it: https://github.com/memcached/memcached/wiki/Commands#standard-protocol
we should move these to their own packages in the future, eg `apollo-server-cache-redis`
ca4d9bb
to
92bd19a
Compare
92bd19a
to
1c92359
Compare
// augment data with tags and version | ||
const version = ++this.logicalClock; | ||
const payload = { | ||
v: version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, interesting! I'm worried keeping a global version instead of a version per tag will become a bottleneck and/or source of concurrency issues, but other people may have a better idea of the impact. I suspect that will depend on the characteristics of the store.
On Memcache, one thing to take into consideration is that the global version key will only be stored on a single node, so that will receive a lot of requests. And if it goes down, you basically lose the ability to validate all data (although the same is true for a subset of the data for per-tag version keys).
version = 1; | ||
await this.client.set(VERSION_KEY, version + 1, 0); | ||
} else { | ||
await this.client.incr(VERSION_KEY, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by the need to increment here, I would expect that to happen only on invalidation. What is the reasoning behind this?
@@ -27,8 +29,17 @@ export default class MemcachedKeyValueCache implements KeyValueCache { | |||
): Promise<void> { | |||
const { ttl, tags } = Object.assign({}, this.defaultSetOptions, options); | |||
|
|||
// get and incr version number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd like to avoid is paying the price for keeping a version (or versions) when a particular entry isn't using cache tags. In that case, I don't think we'll want to read (let alone increment) the version at all.
if (tags.length !== 0) { | ||
const versions = await this.client.getMulti(tags); | ||
for (const tag in versions) { | ||
if (versions[tag] !== undefined && versions[tag] > payload.v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need a way to deal with overflow here, because currently once we wrap around entries will never be valid again.
One benefit of using per-tag versions is that we don't have to depend on ordering. All we need to check is if versions match, and if they do not the entry is invalid. So (I think?) that gets around having to deal with overflow.
@@ -60,11 +73,18 @@ export default class MemcachedKeyValueCache implements KeyValueCache { | |||
|
|||
async invalidate(tags: string[]): Promise<void> { | |||
// set the invalidation "timestamp" using logical clock for every tag | |||
const version = ++this.logicalClock; | |||
let version = await this.client.get(VERSION_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can do this in a more efficient way, because checking to see whether the key exists first on every increment seems expensive (and may also run into concurrency issues). Not sure if this is part of the underlying protocol and whether our client supports it, but it seems libmemcached
has a memcached_increment_with_initial
function.
@abernix this PR seems very interesting for imperative purging by cache tags, can't find an explanation on why this code wasn't merged or there's no plan to support cache tags. |
TODO: