Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache option #284

Merged
merged 80 commits into from
Oct 15, 2017
Merged

Add cache option #284

merged 80 commits into from
Oct 15, 2017

Conversation

lukechilds
Copy link
Contributor

@lukechilds lukechilds commented Mar 23, 2017

Resolves #281

To keep track of what's implemented:

  • Allow user to pass in a cache store (copy Map() API)
  • Cache responses that are cacheable
  • Return cached responses immediately if still fresh
  • If expired send If-Modified-Since/If-None-Match header
  • Handle potential 304 response
  • Tests
  • Refactor into external module
  • Write some cache adapters
  • Documentation

Ok I hacked this together last night, I'm aware there are a few things that need tidying up / potentially unhandled errors etc. Just submitting this PR for some feedback.

I'm caching the url/headers/body of all responses that satisfy:

(opts.cache && statusCode === 200 && req.method === 'GET' && (expired(response.headers) === false || 'etag' in response.headers))

This is working, I can view them in the cache. The issue is to serve them up I need to create a Node.js response (or similar) object to be passed through Got's asPromise()/asStream() fns and to make sure all the usual response API methods are available on the returned object. I was hoping http-response-object was gonna do the job here but it looks like it only implements a basic subset of the Node.js response object.

When I emit the response-like object I get:

UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 2): TypeError: inputStream.once is not a function

I think this is coming from getStream() because http-response-object doesn't implement some stream stuff from the Node.js response object.

Any thoughts? I'm kind of out of my depth here in regards to streams and the native response object. Is there is simpler way to do this? Or maybe a straight forward way to create a fake request object?

// @sindresorhus @reconbot @alextes

@lukechilds lukechilds force-pushed the cache branch 4 times, most recently from aadd907 to 06c0eba Compare March 23, 2017 07:12
@lukechilds
Copy link
Contributor Author

Got it working, http-response-object doesn't implement the Readable Stream interface. Works with this PR: ForbesLindesay/http-response-object#4

Is http-response-object "good enough" to return instead of a native response? If not this may need to be a Got wrapper that just exposes url/headers/body and none of the Node.js response stuff.

@lukechilds
Copy link
Contributor Author

Ok, I've done some digging and a Node.js response is an instance of http.IncomingMessage. So to make the cached version compatible with the Node.js response there's quite a few things we need to support as well as some methods/properties that we can't support on the cached response (e.g response.socket).

We'll need to either add support to all these things in http-response-object or if @ForbesLindesay thinks this is beyond the scope of his module I'm happy to create a new one.

There is also the issue that we'll need to manually keep the cached response up to date with http.IncomingMessage. For example if someone uses a new version of Node.js that has a new method on response, they'll be able to use that new method on normal responses but it won't exist on cached responses until it's implemented in our response package.

This seems like kind of a hacky solution, maybe it would be better to limit the cached responses to only basic data, although this is probably not gonna be obvious to people unless they read the docs. Or alternatively just do this as a wrapper that never exposes the Node.js response.

I'll wait for feedback before going any further with this.

@sindresorhus
Copy link
Owner

Maybe https://github.com/pornel/http-cache-semantics might be of some use. It seems to have a pretty extensive coverage of all caching possibilities and quirks in HTTP. Even if it's just for inspiration.

@sindresorhus
Copy link
Owner

maybe it would be better to limit the cached responses to only basic data

Yes, let's limit it to the basic data. statusCode, headers, body, url.

I'm happy to create a new one.

Seems like you need to do that regardless as you have yet to get a response on your PR.

index.js Outdated
@@ -58,6 +61,20 @@ function requestAsEventEmitter(opts) {
response.url = redirectUrl || requestUrl;
response.requestUrl = requestUrl;

if (opts.cache && statusCode === 200 && req.method === 'GET' && (expired(response.headers) === false || 'etag' in response.headers)) {
const stream = opts.encoding === null ? getStream.buffer(response) : getStream(response, opts);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const encoding = opts.encoding === null ? 'buffer' : opts.encoding;
const stream = getStream(stream, {encoding});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@lukechilds lukechilds force-pushed the cache branch 2 times, most recently from fccafe0 to df09c96 Compare May 4, 2017 05:40
@lukechilds lukechilds changed the title Handle Cache Headers Implement Caching May 4, 2017
@lukechilds
Copy link
Contributor Author

http-cache-semantics looks perfect 😍. I wish I'd seen that before writing expired...

@lukechilds
Copy link
Contributor Author

lukechilds commented May 7, 2017

Just an idea but http-response-object gives us a TTL so we know how long a cache item can be stored for. LevelUP is just get/set with no concept of a TTL. If it's stored, it's stored forever. This isn't a massive issue, we can manually delete items from the cache if we see they're stale on a future request or if LevelUP is backed by a LRU store then it'll take care of itself.

However this functionality is being written in a way that will allow someone to use any cache store with Got and I think TTL is useful information to pass through. What about if we change the cache.put signature to cache.put(key, val, ttl, cb) rather than the current LevelUP signature of cache.put(key, val, cb).

Then we could provide a LevelUP cache plugin for Got so using with LevelUP and friends is still simple:

const got = require('got');
const gotCacheLevelup = require('got-cache-levelup');
const db = gotCacheLevelup('mydb', { db: require('redisdown'), host: 'localhost', port: 6379 });

got('foo.com', {cache: db});

But then if someone wanted implement a native Redis cache plugin for Got they could make use of the TTL.

Also, I was just using callbacks so it worked OOB with a LevelUP db instance. If we are setting our own spec for cache "plugins" then we could require a Promise interface which would be a bit neater.

In my mind I see a cache plugin looking like this:

module.exports = {
  set: (key, val, ttl) => new Promise(),
  get: key => new Promise(),
  del: key => new Promise()
};

Thoughts?

@sindresorhus
Copy link
Owner

Instead of adding an additional parameter to .set(). Couldn't we just create additional keys for meta data. So if there's a key called https://sindresorhus.com, we could also create a __ttl__$__https://sindresorhus.com (Or something), with the TTL value. I would really prefer to keep a simple glue API, so user could just input for example Map() directly without anything extra. Alternatively we could store the TTL in the data, so instead of putting the data {foo: true}, we put {ttl: 123123, data: {foo: true}.

then we could require a Promise interface which would be a bit neater.

We can just run the input through Promise.resolve(), so the user can submit any value or a Promise, and it's awaited if it's a Promise.

@lukechilds
Copy link
Contributor Author

We can still pass the TTL as the third argument to set and match the Map() API. map.set will just ignore the TTL:

const map = new Map();
map.set('foo', 'bar', 100);
map.get('foo');
// "bar"

So got('foo.com', {cache: new Map()}); would work fine. I'm not too sure what you mean regarding creating a new cache entry for the TTL, I don't think I explained myself very well. I mean it's useful for other people to use the TTL in storage adapters, not that it's useful to store.

e.g: If some-cache-store had native TTL functionality to clear old values we could do:

const cache = require('some-cache-store');

const gotCachePlugin = {
  set: (key, val, ttl) => cache.put(key, val, ttl),
  get: key => cache.get(key),
};

module.exports = gotCachePlugin;

We can just run the input through Promise.resolve(), so the user can submit any value or a Promise, and it's awaited if it's a Promise.

Sounds good 👌

@sindresorhus
Copy link
Owner

I'm not too sure what you mean regarding creating a new cache entry for the TTL, I don't think I explained myself very well. I mean it's useful for other people to use the TTL in storage adapters, not that it's useful to store.

I was proposing we just support TTL natively, so users can use any kind of "dumb" storage and get TTL support without doing anything extra.

@lukechilds
Copy link
Contributor Author

I see. But theres no benefit to checking the TTL when we retrieve a cached item, everything is already checked inside policy.satisfiesWithoutRevalidation().

The only use for storing the TTL would be to, on a regular interval, loop over all items from the cache and delete each item which has exceeded it's TTL. IMHO that's not a good idea and would be a job better handled by the underlying database.

@lukechilds
Copy link
Contributor Author

@sindresorhus thanks for taking the time to go over this!

All latest requested changes have been implemented.

Two things I'm not quite sure about:

New Properties

.fromCache and any other extra properties on the response needs to be documented.
#284 (comment)

Is this ok: 4d9ab59

Pulled straight from cacheable-request docs:
https://github.com/lukechilds/cacheable-request#cb

Cache link

There are currently two headings titled "cache", which means the href #cache goes to the api docs, and #cache-1 goes to the main cache section.

Is this ok? Obviously works fine as is but wondering if re-ordering or something in the future could break the other link.

https://github.com/lukechilds/got/tree/cache#cache
https://github.com/lukechilds/got/tree/cache#cache-1

@sindresorhus
Copy link
Owner

There are currently two headings titled "cache", which means the href #cache goes to the api docs, and #cache-1 goes to the main cache section.

I think you can use an HTML link to force it:

<a name="cache"></a>
### Cache

@lukechilds
Copy link
Contributor Author

lukechilds commented Oct 3, 2017

Is this ok?

0a75785

Rather say it will be responselike when it's from the cache, but it should behave exactly like the original

It's worth noting, it won't behave exactly like the original, because the original has references to the socket and stuff which I obviously can't do with a cached response. However none of those core Node.js HTTP response stream methods are actually explicitly mentioned in the Got docs so you could consider them undocumented properties from Got's perspective and argue the behaviour is exactly the same.

Anyway, I linked to the responselike module so they can view more information on exactly what properties it supports

@lukechilds lukechilds force-pushed the cache branch 3 times, most recently from acc9557 to aacb9c9 Compare October 3, 2017 09:30
@lukechilds
Copy link
Contributor Author

lukechilds commented Oct 3, 2017

Hey, just incase you're about to merge this, can I just make a quick tweak to Keyv first. It won't change any current behaviour but it will store the data in a way that will make it much easier for me to make changes to Keyv in the future without affecting 3rd party users like Got.

I want to get this in before Keyv gets more usage, it's literally only a tiny change, just not backwards compatible, working on it now.

@lukechilds
Copy link
Contributor Author

lukechilds commented Oct 3, 2017

Sorted!

Details on why the change was needed here: jaredwray/keyv#29

@sindresorhus sindresorhus merged commit 3c79205 into sindresorhus:master Oct 15, 2017
@sindresorhus
Copy link
Owner

Landed! Thank you so much for working on this 🙌

superhighfive

@lukechilds
Copy link
Contributor Author

My pleasure!

lukechilds added a commit to lukechilds/got that referenced this pull request Nov 14, 2017
lukechilds added a commit to lukechilds/got that referenced this pull request Nov 14, 2017
jotto added a commit to jotto/got-lite that referenced this pull request Nov 30, 2017
brings npm pkg from 0.75MB to 0.45MB

after yarn autoclean --force, down to 0.25MB

from this PR: sindresorhus#284

next big savings is removing progress, from sindresorhus#322
jotto added a commit to jotto/got-lite that referenced this pull request Nov 30, 2017
brings npm pkg from 0.75MB to 0.45MB

after yarn autoclean --force, down to 0.25MB

from this PR: sindresorhus#284

next big savings is removing progress, from sindresorhus#322
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants