Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(gatsby-plugin-sharp): split single file into more maintainable chunks #10964

Merged
merged 40 commits into from
Mar 11, 2019

Conversation

wardpeet
Copy link
Contributor

@wardpeet wardpeet commented Jan 10, 2019

Description

This PR changes how the gatsby-plugin-sharp plugin handles images. Instead of processing our image queue right away we wait processing depending on your environment.

Process

Inside queueImageResizing we are pushing every job into a memory queue and we don't care
about processing here at all. Inside gatsby-node we're distinguishing develop and build.

onPostBootstrap we're going to save our jobs into a cache. This cache is needed when we switch from develop to build mode. If no data changed we won't execute any queries which means no images are scheduled.

When we run gatsby develop we will use the onCreateDevServer hook to add an extra middelware to the express instance. Each file that doesn't exist yet gets fowarded to this route. We look into our queue if we know this url, if we do we process it and send the file over. If not we ignore and let express handle it.

gatsby build looks at our cached queue to see if it needs to do any processing. It loops over the queue and waits until all files are processed. We do this inside onPostBuild to make sure it's only ran in build mode.

Updates

  • Added jest-serializer-path to fix snapshots absolute paths
  • Updated snapshots as sharps arguments got changed to reduce extension & toFormat checks
  • Added integration tests to see if caching works between develop & build
  • Added unit test to test image on demand.

Related Issues

#10815

@wardpeet
Copy link
Contributor Author

wardpeet commented Jan 10, 2019

I've got an issue that my cache directory is not being made which is problematic for this feature. I'll give it a go on linux/mac but we probably need to fix this. Not sure if this is a known issue

Copy link
Contributor

@DSchau DSchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some preliminary comments. This is looking great. I did test this out with examples/using-gatsby-image, and I occasionally received the following error, which could be related to your comment about the cache directory?

Error: Invariant Violation: Encountered an error trying to infer a GraphQL type for: "localFile___NODE". There is no corresponding node with the id field matching: "3a6b85557cfac6217812183fb1c7cc5e"

packages/gatsby-plugin-sharp/.gitignore Outdated Show resolved Hide resolved
packages/gatsby-plugin-sharp/src/gatsby-node.js Outdated Show resolved Hide resolved
packages/gatsby-plugin-sharp/src/gatsby-node.js Outdated Show resolved Hide resolved
@wardpeet
Copy link
Contributor Author

So what I haven't fixed yet is removing images that should not be processed because a page or query changed.

@wardpeet wardpeet requested a review from a team as a code owner January 25, 2019 15:33
@wardpeet
Copy link
Contributor Author

should be good to go

Copy link
Contributor

@pieh pieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty nice!

There are couple of low-hanging fixes and improvements we can quickly make (inline comments).

But there is also one unsolved issue:
some images can be scheduled to be processed, but later on should be invalidated that currently doesn't happen - for most part this would just mean extra unneded work, but there is edge cases where this would cause straight errors - for example image was scheduled, but later on source image was deleted and now this scheduled task have no way of finishing and we would actually throw errors about trying to read not-existing files? Initially we can just add some sanity checks to ensure input file exists before trying to process it I think, but later on I think we need to keep track of queries to remove stale tasks

packages/gatsby-plugin-sharp/src/processFile.js Outdated Show resolved Hide resolved
packages/gatsby-plugin-sharp/src/gatsby-node.js Outdated Show resolved Hide resolved
packages/gatsby-plugin-sharp/src/index.js Outdated Show resolved Hide resolved
packages/gatsby-plugin-sharp/src/processFile.js Outdated Show resolved Hide resolved
packages/gatsby-plugin-sharp/src/processFile.js Outdated Show resolved Hide resolved
packages/gatsby-plugin-sharp/src/index.js Outdated Show resolved Hide resolved
@wardpeet
Copy link
Contributor Author

But there is also one unsolved issue:
some images can be scheduled to be processed, but later on should be invalidated that currently doesn't happen - for most part this would just mean extra unneded work, but there is edge cases where this would cause straight errors - for example image was scheduled, but later on source image was deleted and now this scheduled task have no way of finishing and we would actually throw errors about trying to read not-existing files? Initially we can just add some sanity checks to ensure input file exists before trying to process it I think, but later on I think we need to keep track of queries to remove stale tasks

Is it ok if I do file test in this PR and do the query stuff in a follow up PR.

@pieh
Copy link
Contributor

pieh commented Jan 30, 2019

Is it ok if I do file test in this PR and do the query stuff in a follow up PR.

👍 from me - it will be much easier to review

@wardpeet wardpeet requested review from a team February 11, 2019 12:32
@wardpeet wardpeet force-pushed the feat/lazy-sharp-dev branch 2 times, most recently from f444f71 to e40b829 Compare February 11, 2019 13:22
Copy link
Contributor

@DSchau DSchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wardpeet excellent job with this PR! Very clean and well-tested, awesome stuff.

Generally - this looks great to me. There are clear and obvious performance improvements with the develop lifecycle with this change applied (tested on www/).

(bootstrap from a clean slate was around ~72 seconds with this change, and ~267.217s before this change--and it actually threw a SIGSEGV error the first time I tried)

Even if this doesn't introduce anything breaking, since it is a fairly significant refactor of some core stuff, I'd like to call this a minor bump for gatsby-plugin-sharp. Thoughts @pieh?

@wardpeet
Copy link
Contributor Author

@DSchau thanks! I hope we can get more of these integration tests running to test some core pieces.

@pieh if it helps I can omit default values from the args to make the cache smaller. I could do the same for inputPath & outputPath (strip the program root, where possible). I added an escape hatch called lazyImageGeneration that you can set to false if we want to opt out. I should add it to the README.md

@pieh
Copy link
Contributor

pieh commented Feb 15, 2019

@pieh if it helps I can omit default values from the args to make the cache smaller.

If it's relatively straight forward to do - let's strip as much as we can. But with changes you did to not persist entire File node earlier we are pretty good already (so this is nice to have, not a blocker)

I'm still checking and validating this, but initial checks redux-state.json for gatsbyjs.org (with screenshot placeholders - so not trully production build, but still ~2400 image thumbnails) increased from 26314316 to 30325343 bytes so ~15% (that's after running gatsby develop and not visiting site, so all images processing tasks are in cache).

@Moocar
Copy link
Contributor

Moocar commented Feb 17, 2019

@wardpeet Thanks for the heads up on this PR. I'm working on making the graphql resolvers run in other processes. To do this, those resolvers can't rely on accessing the redux store directly, since it runs in another process. This was fine, except that plugin-sharp manually requires the redux store in order to call the setJob bound action creators. When running in the main process, that's fine, but when it's running in another process, it actually ends up creating a redux store per process, which means the setJob calls are entirely uncoordinated.

I've been teasing that out so that the job API functions are provided to plugin-sharp instead of it having to require them. This PR makes it even harder to run plugin-sharp in another process because it introduces more globals and dependence on the gatbsy-node APIs. But the benefits are obviously massive so I recommend merging this PR, and I'll have a think about whether it's possible to make this run in parallel resolvers land.

@Moocar
Copy link
Contributor

Moocar commented Feb 17, 2019

I think it might be possible to make parallel resolvers work here. Calls to fluid, fixed etc don't cause side effects (apart from reading image size from disk and generating base64) until they add the processing job to the queue. So theoretically, we could change queueImageResizing so that if it's running in another process, it instead makes a call to the main process where it puts the job into the in memory queue. Otherwise it runs as normal.

I'll wait till this is merged, and then have more of a think about it.

@wardpeet
Copy link
Contributor Author

@pieh I could run this to remove default ones:
https://repl.it/@wardpeet/omit-defaults

I could even remove lodash if needed. WDYT? should I sneak it in and than we're good to go?

@wardpeet
Copy link
Contributor Author

@pieh found an issue where we could remove images from disk between develop & build. At the moment we fail hard 😄. The idea is to do a simple file_exists on all files that are inside the cached version and ignore those errors. We will still fail hard if our current queue has errors.

Any issues with this approach? I would love to get this merged 😄

@pieh
Copy link
Contributor

pieh commented Feb 21, 2019

I was thinking yesterday and today about this and this approach also have problems:

With silencing error build succeed, but we have missing assets for that image. I think build should fail.

I don't have good approach yet. We might need for gatsby to expose more information to gatsby-plugin-sharp to properly invalidate stale jobs

@wardpeet wardpeet added the status: blocked This issue/PR can't be solved at the moment and shouldn't be closed/merged label Feb 26, 2019
@wardpeet wardpeet removed the status: blocked This issue/PR can't be solved at the moment and shouldn't be closed/merged label Feb 27, 2019
@wardpeet
Copy link
Contributor Author

Turned lazy loading off so we can merge this without having impact 👍

cc @pieh

@pieh pieh changed the title feat(sharp): image creation on demand (develop) refactor(gatsby-plugin-sharp): split single file into more maintainable chunks Mar 8, 2019
Copy link
Contributor

@pieh pieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, let's get that in!

@pieh pieh merged commit 7aef5ca into master Mar 11, 2019
@sidharthachatterjee sidharthachatterjee deleted the feat/lazy-sharp-dev branch March 11, 2019 13:52
@sidharthachatterjee
Copy link
Contributor

Published in gatsby-plugin-sharp@2.0.26

@KyleAMathews
Copy link
Contributor

Just a thought — if image processing jobs are marked as a dependency of a query — then when we re-run the query, we would delete all the old jobs associated with it (and recreate them of course when the query runs). This would solve I believe the stale job problem because deleting a file between dev and build would dirty the query making it re-run.

@muescha
Copy link
Contributor

muescha commented Feb 19, 2020

any plans to turn the lazyImageGeneration on?

@wesbos
Copy link
Contributor

wesbos commented Mar 5, 2020

Please turn this on :) My build takes ~8 mins on a really fast macbook pro before I can even see my site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: awaiting reviewer response A pull request that is currently awaiting a reviewer's response
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants