Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cache tag invalidation #1169

Closed
wants to merge 14 commits into from

Conversation

clarencenpy
Copy link
Contributor

TODO:

  • Update CHANGELOG.md with your change (include reference to issue & this PR)
  • Make sure all of the significant new logic is covered by tests
  • Rebase your changes on master so that they can be merged easily
  • Make sure all tests and linter rules pass

@ghost ghost added the ⛲️ feature New addition or enhancement to existing solutions label Jun 13, 2018
@clarencenpy
Copy link
Contributor Author

@martijnwalraven I've implemented basic support for Memcached/Redis for the KeyValueCache interface, which are currently just simple wrappers over other clients. Supports get/set with ttl.
I've written some tests against external databases just to make sure that things work for now, but they will probably be unnecessary going forward (we just need to test the cache-tag invalidation logic).

Copy link
Contributor

@martijnwalraven martijnwalraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start! Really happy with the code quality and attention to testing. It seems we should be able to get a basic version in sooner than expected!

"jest": "^23.1.0",
"jest-each": "^23.1.0",
Copy link
Contributor

@martijnwalraven martijnwalraven Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about jest-each, but that's a neat solution! it seems support for this has (very) recently become part of core Jest though, so maybe we don't even need the dependency (see README).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch!

@@ -0,0 +1,6 @@
export const servers = {
Copy link
Contributor

@martijnwalraven martijnwalraven Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to deal with external dependencies means these are not really unit tests and we probably don't want to run them as part of CI for example. I think they are great as integration tests, so I'd like to keep them (especially if we add cache tags on top of the basic features), but I'm not sure how we should organize them to separate them from the unit tests and what the best practices around integration testing are (@jbaxleyiii may have some ideas there).

A script to start a Memcached and Redis server ready for these tests would be useful I think. And maybe we could run it as part of a npm run test:integration command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a lot less familiar with Redis (though I suspect it should be similarly easy) but Memcached should be incredibly easy to mock. In fact, a quick search turned up memcached-mock and redis-mock.

If it's at all feasible to use those, rather than implementing a script to start one (which would need to struggle with cross-platform network/process implementation details, not to mention installation), that seems preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed! these integration tests probably don't need to be run with CI, its just for me to validate that the dependencies on external libraries work as expected. I'm running the databases with docker, will try to set it up as a npm script!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abernix that's awesome! I will use proxyquire to mock memcached and node-redis for testing (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: turns out I cant get proxyquire to work with typescript. Using jest.mock instead and it works great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better :)

it('is able to expire keys based on ttl', async () => {
await keyValueCache.set('short', 's', { ttl: 1 });
await keyValueCache.set('long', 'l', { ttl: 5 });
await delay(1500);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to depend on actual time passing (as opposed to mocking timers) is another reason this isn't really suitable as a unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable to leave out tests for key expiry?

Copy link
Contributor

@n1ru4l n1ru4l Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarencenpy This may blast the complexity of the tests, but I can recommend lolex for doing time stuff. You can see an example here: https://github.com/n1ru4l/react-time-ago/blob/29006c7a2080626ec4206560fe4ecf69c3ca079b/src/index.test.js#L52

Copy link
Contributor Author

@clarencenpy clarencenpy Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! I will test it out with the mocked databases and it will be great if that plays well together! Thanks 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using @martijnwalraven 's mockDate() (for calls to global date) and jest.useFakeTimers() (for calls to setTimeout). Seems to work great with the mock libs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Yeah, as long as we're using mocks for the databases we don't need actual delays.

private client;
private defaultSetOptions = {
ttl: 300,
tags: [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if default tags make sense. But a default ttl seems really useful, so maybe we could even expose that as a public property / configuration option.


close(): Promise<void> {
this.client.end();
return Promise.resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make the function async I think you may not need the explicit Promise.resolve() here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Assuming client.end() doesn't take a callback like the Redis equivalent.)

@martijnwalraven martijnwalraven force-pushed the server-2.0/caching branch 2 times, most recently from edbb2c8 to 395da0f Compare June 14, 2018 18:18
Copy link
Contributor

@evans evans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, just a few comments about async function and one for the test and then we can merge!

}
}

async flush(): Promise<void> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an async function, thrown errors are automatically wrapped in a promise rejection and returned values are wrapped in a promise resolution, so there's no need to wrap the this.client.flush call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! This simplifies the code significantly!


it('is able to expire keys based on ttl', async () => {
await keyValueCache.set('short', 's', { ttl: 1 });
await keyValueCache.set('long', 'l', { ttl: 5 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we can read these values right after they have been set and then ensure that both are invalidated after advancing the time past 5 seconds

@clarencenpy clarencenpy force-pushed the server-2.0/caching-connectors branch 2 times, most recently from 88c3a8b to 408edba Compare June 15, 2018 01:59
t: tags,
};

await this.client.set(key, JSON.stringify(payload), ttl);
Copy link
Contributor Author

@clarencenpy clarencenpy Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of serializing tag metadata together with the key, we could also store metadata in separate namespaced keys:
set('key1', 'value1', ['tag1', 'tag2']) gets stored like this:

key1: value1
meta-v-key1: <version number> 
meta-t-key1: 'tag1||tag2'

// augment data with tags and version
const version = ++this.logicalClock;
const payload = {
v: version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this wasn't really what I had in mind, and I don't think this will work as is. If we keep a logical clock in memory, that will be per process. So it seems there is no way to keep this working correctly across multiple server instances talking to the same store.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that we would store <tag>: <version> pairs in cache entries, and also keep separate <tag>: <version> entries. We could then use store primitives to safely increment the version for a tag entry without the need for a global clock.

Copy link
Contributor Author

@clarencenpy clarencenpy Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that it would be easy to move the logical clock into the key-value store as an entry of its own, once we think about multiple server processes. It seems to me like an easier thing to keep track of? Refer to this commit

Copy link
Contributor Author

@clarencenpy clarencenpy Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this boils down to a tradeoff between managing versions for every tag (more complex code, and reading all tag versions for each set operation), vs managing a global version number (will we pay in performance?). Would be happy to learn about other alternatives that I am missing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, interesting! I'm worried keeping a global version instead of a version per tag will become a bottleneck and/or source of concurrency issues, but other people may have a better idea of the impact. I suspect that will depend on the characteristics of the store.

On Memcache, one thing to take into consideration is that the global version key will only be stored on a single node, so that will receive a lot of requests. And if it goes down, you basically lose the ability to validate all data (although the same is true for a subset of the data for per-tag version keys).

@@ -0,0 +1,111 @@
declare function fetch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need declarations for fetch and related types here, we'll get that from apollo-server-env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -0,0 +1,41 @@
declare class URL {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this file shouldn't be needed either.

const version = ++this.logicalClock;
const operations: any[] = [];
for (const tag of tags) {
// what should be a good ttl to set here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how Memcache/Redis deal with this, but ideally we wouldn't want tags to ever expire at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move these to their own packages in the future, eg `apollo-server-cache-redis`
@clarencenpy clarencenpy force-pushed the server-2.0/caching-connectors branch from ca4d9bb to 92bd19a Compare June 16, 2018 00:07
@clarencenpy clarencenpy force-pushed the server-2.0/caching-connectors branch from 92bd19a to 1c92359 Compare June 16, 2018 00:10
// augment data with tags and version
const version = ++this.logicalClock;
const payload = {
v: version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, interesting! I'm worried keeping a global version instead of a version per tag will become a bottleneck and/or source of concurrency issues, but other people may have a better idea of the impact. I suspect that will depend on the characteristics of the store.

On Memcache, one thing to take into consideration is that the global version key will only be stored on a single node, so that will receive a lot of requests. And if it goes down, you basically lose the ability to validate all data (although the same is true for a subset of the data for per-tag version keys).

version = 1;
await this.client.set(VERSION_KEY, version + 1, 0);
} else {
await this.client.incr(VERSION_KEY, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by the need to increment here, I would expect that to happen only on invalidation. What is the reasoning behind this?

@@ -27,8 +29,17 @@ export default class MemcachedKeyValueCache implements KeyValueCache {
): Promise<void> {
const { ttl, tags } = Object.assign({}, this.defaultSetOptions, options);

// get and incr version number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'd like to avoid is paying the price for keeping a version (or versions) when a particular entry isn't using cache tags. In that case, I don't think we'll want to read (let alone increment) the version at all.

if (tags.length !== 0) {
const versions = await this.client.getMulti(tags);
for (const tag in versions) {
if (versions[tag] !== undefined && versions[tag] > payload.v) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need a way to deal with overflow here, because currently once we wrap around entries will never be valid again.

One benefit of using per-tag versions is that we don't have to depend on ordering. All we need to check is if versions match, and if they do not the entry is invalid. So (I think?) that gets around having to deal with overflow.

@@ -60,11 +73,18 @@ export default class MemcachedKeyValueCache implements KeyValueCache {

async invalidate(tags: string[]): Promise<void> {
// set the invalidation "timestamp" using logical clock for every tag
const version = ++this.logicalClock;
let version = await this.client.get(VERSION_KEY);
Copy link
Contributor

@martijnwalraven martijnwalraven Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can do this in a more efficient way, because checking to see whether the key exists first on every increment seems expensive (and may also run into concurrency issues). Not sure if this is part of the underlying protocol and whether our client supports it, but it seems libmemcached has a memcached_increment_with_initial function.

@clarencenpy clarencenpy changed the title Add basic support for Memcached and Redis Add support for cache tag invalidation Jun 18, 2018
@ghost ghost added the ⛲️ feature New addition or enhancement to existing solutions label Jun 18, 2018
@abernix abernix deleted the server-2.0/caching-connectors branch February 25, 2020 20:59
@sebas5384
Copy link
Contributor

@abernix this PR seems very interesting for imperative purging by cache tags, can't find an explanation on why this code wasn't merged or there's no plan to support cache tags.
Could you please point me in the right direction? I need this feature. Thanks :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
⛲️ feature New addition or enhancement to existing solutions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants