-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(gatsby-source-contentful): prevent redundant fs/remote fetches for base64 #28438
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clear the promise cache at some point?
Without it, long-running processes like previews/inc.builds will never re-request images even if they change?
We could revalidate them by storing the etag. This is what createRemoteFileNode does. |
In short, I think we don't need to worry about this
Is my analysis accurate? |
Oh, important point as well; this is not a generic solution. This is specifically for Contentful and their CDN. So we don't need to worry about the "what if"s on that side. We can verify this with them if we doubt it. |
🤣 |
Yeah this is (probably?) true so we don't need to worry about invalidation. You could verify this trivially by trying to modify a photo on a node in Contentful and seeing if the ID is the same. |
(Verified this with Contentful) |
Published in |
Currently the base64 fetch will prefer to read from local cache if the url is known, or do a remote fetch if the url is unknown. It does not do an in-flight check which means the same url could be scheduled and fetched multiple times, as long as one has not completed. This applies to remote fetch and local cache fetches.
This PR introduces two caches (separate commits):
The function will now prefer to return the data sync, from the resolved cache. If the url is not known there then it will check the in flight cache. This cache may contain the promise if it is still in flight, or if the request rejected (it will not retry). If that cache doesn't have it then the disk cache is checked. If the disk cache has it then read it and return it. Otherwise do a remote fetch and return it. In both fetch cases, the read promise is stored in the in flight cache and when it resolves, it is removed from the in flight cache and its payload added to the resolved cache.
In the real world I stumbled on a customer where the site was triggering 245.000 base64 image requests (these are only debounced by the nodejs default limits and, to some degree, by the query concurrency count). It turned out that there were actually only 5.000 unique urls, so 240.000 requests were redundant.
While testing that same site we encountered curious inconsistent delays that tracked back to this function.