-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gatsby-source-filesystem): lazy file download #20741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good :)
@@ -7,7 +7,7 @@ const GraphQLRunner = require(`./graphql-runner`) | |||
|
|||
const createBaseOptions = () => { | |||
return { | |||
concurrent: 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This impacts more than just file downloads. There are many internal Gatsby things that use this queue (like run queries for example).
I'm not sure who's best to judge on whether this change is a problem overall but it shouldn't be taken lightly. After all, there's a reason we're using a queue :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is certainly the biggest deal. It's actually the query concurrency that I'm needing to icnrease here, as I think that's the bottleneck for the file downloading when it's happening in the resolver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense for this to be configurable? I know the default for concurrent file downloads is 200 right now, so 100 might still be a bottleneck in some situations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just chose 100 arbitrarily. I guess it would make sense to have them configurable in the same way
5333b40
to
44578d5
Compare
@pvdz I've done a refactor that should address all of your comments |
cf16df4
to
9d22f10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
Or well. I guess some tests are still failing. |
And this is a wip. Sigh :) |
9d22f10
to
ea46fbb
Compare
…-remote-file-loading
@ascorbic , I was trying my own implementation of this on Gatsby cloud and it was failing the build there even though it worked locally. Thought I should give you a heads up about that as that may mean the jobs API needs to be modified to account for this. |
@TylerBarnes This is now passing in Cloud tests, but I've not tried deploying a site that actually uses it yet |
This is failing in Cloud. Was your failure similar, @TylerBarnes ?
|
That is caused by a known race condition in Sharp (#6291). Only recourse right now is to try again if it fatals, and ignore it if it doesn't fatal. |
@ascorbic I just checked my old cloud logs, but unfortunately it looks like I deleted and recreated the site I was having that issue on so I don't have the error messages anymore. |
FYI: Sharp as released a new version (0.25.1 right now) which should solve the segfaults / warning spam. This requires node 10+ (which is why we can't ship this with Gatsby just yet) but we expect to ship this in one or two weeks, once we cut over to node 10 as well. |
(I've converted this PR into a Draft PR. I hope this gets merged soon. The Sharp issue should be resolved. Any status update?) |
Just a note that with Drupal and
In the above example, for the "Album" content type it will download the "album_cover" field (which is a file field). However, it won't download all of the files on the site by default (only the ones attached to the filtered albums). |
The problem defined in this PR is totally valuable, though we're reconsidering how to approach (not via opt-in via plugin etc.) Closing, and will revisit later this year. |
Description
This PR adds support for lazy downloads of remote files. It includes an example implementation in gatsby-source-drupal and the using-drupal example site.
The current behavior is to download all remote files at the sourcing stage. This can cause problems if there are large numbers of files in the CMS that are not used in the site. This PR adds a new node type RemoteFile, which includes some file metadata but does not download the image itself. The resolver adds a
localFile
field to this, which is aFile
that downloads the actual file at resolve time.This also lifts the query concurrency limit from 4 to 100, as otherwise this would limit the number of files that could download.
Documentation
This adds a new method
createLazyFileNode
, which is a drop-in replacement forcreateRemoteFileNode
. A source plugin can use this to opt-in to lazy file downloads. As this method makes no attempt to ascertain the filetype beyond the file extension, it is recommended to pass in the file extension if this is available from the CMS.Because this adds resolvers and updates the schema, it means gatsby-source-filesystem needs adding to the plugins array. This should be done by adding a gatsby-config to the source plugin which includes
gatsby-source-filesystem
. There is a new option for thisaddResolversOnly
which stops the plugin trying to add files from a path.This PR adds support to gatsby-source-drupal. It is disabled by default, so is not a breaking change. To enable it, pass
lazyFileDownloads: true
in the config.To use these files in a site, the queries need to change, as the file itself is a child of RemoteFile. For example, in this Drupal implementation, a query that included the following:
..would need to change to:
You cannot query on the
localFile
fields, so you can't for example query on filesize or other metadata. TheRemoteFile
includes metadata that can be derived form the URL, including the local path and mediaType if the path includes a file extension.Related Issues
#13909