Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It takes 8 hours to run Gatsby Develop + Drupal 8 store (40 000 products) #24257

Closed
KarolKier opened this issue May 20, 2020 · 15 comments
Closed
Assignees
Labels
topic: performance Related to runtime & build performance topic: source-drupal Related to Gatsby's integration with Drupal type: feature or enhancement Issue that is not a bug and requests the addition of a new feature or enhancement.

Comments

@KarolKier
Copy link

KarolKier commented May 20, 2020

Summary

Hello,

I have website built on Headless Drupal 8 with around 30 000 products (+10 000 out of stock) I started with building frontend on Gatsby without connecting to my Drupal 8 backend when I tried to connect it using gatsby-source-drupal plugin a lot of different problems started to show up.

Each product have 5 images so in total there should be around 200 000 images

Basic example

  1. Javascript out of heap error - when reaching 2 GB (solved) by adding new command in package.json
    "heavyload": "node --max-old-space-size=14192 ./node_modules/gatsby/dist/bin/gatsby.js develop"
    and running: npm run heavyload

  2. Fetching - info Starting to fetch data from Drupal
    success Fetch data from Drupal - 2623.108s - take around 43 minutes

  3. info Downloading remote files from Drupal <- and now the real problem begins, in order to run the Gatsby Develop our script starts to download all files into cache files (around 50GB) unfortunately if it fails while downloading those (50 GB) files and I want to run Gatsby develop again, I need to download all 50 GB from scratch (it already happened twice to me) and it takes around 8 hours to download 80 gb of files.

  4. After running the Gatsby Develop for past 20 hours I was unable to finish this process even once while connected to my Drupal website I'm attaching log of failed attempt

0 info it worked if it ends with ok
1 verbose cli [
1 verbose cli   'C:\\Program Files\\nodejs\\node.exe',
1 verbose cli   'C:\\Program Files\\nodejs\\node_modules\\npm\\bin\\npm-cli.js',
1 verbose cli   'run',
1 verbose cli   'heavyload'
1 verbose cli ]
2 info using npm@6.14.4
3 info using node@v12.16.3
4 verbose run-script [ 'preheavyload', 'heavyload', 'postheavyload' ]
5 info lifecycle gatsby-starter-hello-world@0.1.0~preheavyload: gatsby-starter-hello-world@0.1.0
6 info lifecycle gatsby-starter-hello-world@0.1.0~heavyload: gatsby-starter-hello-world@0.1.0
7 verbose lifecycle gatsby-starter-hello-world@0.1.0~heavyload: unsafe-perm in lifecycle true
8 verbose lifecycle gatsby-starter-hello-world@0.1.0~heavyload: PATH: C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\node-gyp-bin;C:\Users\Karol\nauka-gatsby\mywebsite\node_modules\.bin;C:\Python38\Scripts\;C:\Python38\;C:\Python27\;C:\Python27\Scripts;C:\WINDOWS\system32; ##commented out some of things from here ### C:\Users\Karol\AppData\Roaming\npm;C:\Users\Karol\AppData\Local\Programs\Microsoft VS Code
9 verbose lifecycle gatsby-starter-hello-world@0.1.0~heavyload: CWD: C:\Users\Karol\nauka-gatsby\mywebsite
10 silly lifecycle gatsby-starter-hello-world@0.1.0~heavyload: Args: [
10 silly lifecycle   '/d /s /c',
10 silly lifecycle   'node --max-old-space-size=8192 ./node_modules/gatsby/dist/bin/gatsby.js develop'
10 silly lifecycle ]
11 silly lifecycle gatsby-starter-hello-world@0.1.0~heavyload: Returned: code: 1  signal: null
12 info lifecycle gatsby-starter-hello-world@0.1.0~heavyload: Failed to exec heavyload script
13 verbose stack Error: gatsby-starter-hello-world@0.1.0 heavyload: `node --max-old-space-size=8192 ./node_modules/gatsby/dist/bin/gatsby.js develop`
13 verbose stack Exit status 1
13 verbose stack     at EventEmitter.<anonymous> (C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\index.js:332:16)
13 verbose stack     at EventEmitter.emit (events.js:310:20)
13 verbose stack     at ChildProcess.<anonymous> (C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\lib\spawn.js:55:14)
13 verbose stack     at ChildProcess.emit (events.js:310:20)
13 verbose stack     at maybeClose (internal/child_process.js:1021:16)
13 verbose stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:286:5)
14 verbose pkgid gatsby-starter-hello-world@0.1.0
15 verbose cwd C:\Users\Karol\nauka-gatsby\mywebsite
16 verbose Windows_NT 10.0.18362
17 verbose argv "C:\\Program Files\\nodejs\\node.exe" "C:\\Program Files\\nodejs\\node_modules\\npm\\bin\\npm-cli.js" "run" "heavyload"
18 verbose node v12.16.3
19 verbose npm  v6.14.4
20 error code ELIFECYCLE
21 error errno 1
22 error gatsby-starter-hello-world@0.1.0 heavyload: `node --max-old-space-size=8192 ./node_modules/gatsby/dist/bin/gatsby.js develop`
22 error Exit status 1
23 error Failed at the gatsby-starter-hello-world@0.1.0 heavyload script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 1, true ]

Motivation

I'm not sure what I'm doing wrong, but I don't know honestly how to deal with it, I want to connect my Gatsby website with my Drupal 8 website, but if it takes around 8 hours to connect and after 8 hours I get error and have to try from scratch it, I really don't know what to do.

Perhaps it is my API fault, or Gatsby Drupal plugin have missing configuration, I can share my API but I wouldn't want to make it public.

@KarolKier KarolKier added the type: feature or enhancement Issue that is not a bug and requests the addition of a new feature or enhancement. label May 20, 2020
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label May 20, 2020
@vladar vladar added topic: performance Related to runtime & build performance and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels May 20, 2020
@vladar vladar added the status: needs core review Currently awaiting review from Core team member label May 20, 2020
@pvdz
Copy link
Contributor

pvdz commented May 20, 2020

Ok so this will qualify as a big site :)

I would ask for a repro but that's gonna be difficult. I suspect we don't have a mechanism (yet) for a local cache of all remote artifacts so that repeated builds do not need to redownload everything. Not very useful for you but helpful for repro/testing.

I'm very interested in seeing the node count. This would be printed when you use --verbose but your build never gets beyond the sourcing step, unfortunately. Although this is a develop run. Could you try to do something like node --max-old-space-size=14192 node_modules/.bin/gatsby build --verbose and see if you ever get past the sourcing step? In that case it should print the internal node count stats. I'm very curious to see those. But it sounds like it'll take a long time for you to get there...

@KarolKier
Copy link
Author

Yes sir, I'm on it, I will report back as soon as possible but in 10 hours from now I will be probably sleeping so it may take up to 18 hours

Before creating this issue I did tried to test it couple of times to exclude any issues such as loss of network in the 8 hours time or bad connection, today morning I also gave it another run but it failed after 50 minutes of running however it didn't downloaded any new files and when it happened it "kind of finnished" but failed to connect to Drupal and drupal source was not visible in graphQL however on the bottom right page there was displayed "5 pages" (created from static .js pages)

One more interesting fact which suprised me in regards to folder .cache all those GB are beign downloaded into: .cache\caches\gatsby-source-filesystem

I was expecting images that are downloaded from drupal to be in: .cache\caches\gatsby-source-drupal

Website may be actually bigger when it comes to total number of "nodes" in Drupal terminology there should be more than number of products as this website was also in 8 languages however only some products where translated in general there were around 40 000 products in 3 translations which would give around 120 000 nodes - but in Gatsby I do not generate any nodes programaticaly from the GraphQL as I was unable to fetch any data to display in GraphQL yet, in general I do plan display only nodes from one specific language.

@KarolKier
Copy link
Author

I'm sorry about my reply, I was unable to test it with this command:

node --max-old-space-size=14192 node_modules/.bin/gatsby build --verbose

It gave following error:

C:\Users\Karol\nauka-gatsby\website\node_modules\.bin\gatsby:2
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
          ^^^^^^^

C:\Users\Karol\nauka-gatsby\ciucholando\node_modules\.bin\gatsby:2
basedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")
          ^^^^^^^

SyntaxError: missing ) after argument list
    at wrapSafe (internal/modules/cjs/loader.js:1047:16)
    at Module._compile (internal/modules/cjs/loader.js:1097:27)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
    at Module.load (internal/modules/cjs/loader.js:977:32)
    at Function.Module._load (internal/modules/cjs/loader.js:877:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:74:12)
    at internal/main/run_main_module.js:18:47
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! gatsby-starter-hello-world@0.1.0 magic: `node --max-old-space-size=14192 node_modules/.bin/gatsby build --verbose`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the gatsby-starter-hello-world@0.1.0 magic script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     C:\Users\Karol\AppData\Roaming\npm-cache\_logs\2020-05-21T11_53_13_039Z-debug.log

I modified my older command by adding --verbose to it and it somewhat moved forward but there was loss of connection during the night and it provided following error with some verbose information:

Error: connect ETIMEDOUT MyIP:443

warn The gatsby-source-drupal plugin has generated no Gatsby nodes. Do you need it?
verbose Now have 33 nodes with 7 types: [SitePage:1, SitePlugin:25, Site:1, SiteBuildMetadata:1, Directory:1, File:2, ImageSharp:2]
success source and transform nodes - 2266.676s
success building schema - 0.227s
verbose Now have 33 nodes with 7 types, and 1 SitePage nodes

@smthomas
Copy link
Contributor

I can try to provide some context on using gatsby-source-drupal on a site this large:

  1. There is currently work in progress that will allow incremental downloads. This means if you do get it to build once locally you will not need to re-download and re-process all the images/content on subsequent runs. This is not quite ready yet and I don't have a specific timeline on when it will be ready for testing.
  2. I have heard from other community members that there are multilingual issues with using JSON:API and gatsby-source-drupal. You mention you are only planning on using one language for the content of the Gatsby site, so this should work. However, I would be interested to know if you run into any issues specifically regarding the multilingual setup on your Drupal site.

Now for some more in-depth testing...

The gatsby-source-drupal plugin scans the JSON:API of your Drupal site and uses that to pull down all the data. There are a number of potential things we can do to get data into your Gatsby site. These might not fix the underlying scaling problem, but might help us diagnose where the issues are (my guess is image processing, but we can test that):

  • You might consider using something like the JSON:API Extras module on your Drupal site. Here you can go to the admin page and turn off any entities you don't want to be downloaded to your Drupal site. I would recommend turning off file and media entities for now which should allow you to start working with all the data without the extremely long download times. We are going to need to revisit this later, but this will help us confirm if image download/processing is the bottleneck.
  • You can filter the data pulled back from Drupal in your gatsby-config.js file. Here you can add JSON:API filters which might help limit the data you are pulling in just to get things running with gatsby develop.
  • Gatsby is likely trying to pull down all the entities from Drupal (including all translations). This PR and attached issue helped add support for JSON:API includes which allows you to be even more selective on what entities and relationships you follow when pulling down data into Gatsby.

Can you try a few of the above methods to see if you can get data pulled into gatsby develop and from there we can work on determining/fixing the biggest bottlenecks?

@smthomas smthomas added the status: awaiting author response Additional information has been requested from the author label May 21, 2020
@KarolKier
Copy link
Author

Thank you so much for getting back to me I really appreciate that, I have been also following discussion that I have noticed on twitter today and I also come to some ideas (which are not very good most likely)

  1. If I would find any issues I will of course report them as soon as I can altough first I need to access GraphQL to be able to even filter the content by language I want in order to fetch anything.

  2. I will disable entities like file / image - but I'm a bit worried that if I will fetch products and products have assigned images to them they may still be imported.

  3. Is there a way I could use filters to limit total number of nodes that are downloaded? as well as their language before loading json to speed up the testing process, should it be done via filters or in the of json generated?

The general plan is to at least make this website work without any images I will do my best to test it out and report back to you.

Hopefully this topic will help someone else in future as well.

@smthomas
Copy link
Contributor

I did confirm if you disable the file and media entities in the JSON:API Resource Overrides admin page it will not download those files (even if your content entities reference them). That step alone should get your local development site building. Let me know if that works to at least get the data without the images.

@KarolKier
Copy link
Author

Thank you so much!

Just wanted to confirm after testing out with disabling the FILE and IMAGE STYLES it increased time for "npm start heavyload" (gatsby develop) from around 34603.075s (9.5 hours) to just 416.814s ( 7 minutes )

But no pages are yet generated dynamicaly from queries so time will defenitly increase after setting up the generation of product pages. (I will report back when I will be able to test it further)

However I was trying to at least somehow access Absolute path for the missing images to try to somehow just print out them using as temporary solution, however after disabling the FILE of course it is no longer possible as it is kind of stored in this entity.

I'm trying to think of a way how to make it possible to somehow display images without actually downloading them all.

@LekoArts LekoArts removed the status: awaiting author response Additional information has been requested from the author label May 25, 2020
@KarolKier
Copy link
Author

Hello,

Small case update, unfortunately I think I need to use Commerce API instead of just using standard JSON API (however) in standard JSON API I can use JSONAPI EXTRAS and this allows me to disable the "file--file" but when I use Commerce API it have conflict with JSONAPI EXTRAS and I no longer can disable the file--file from Drupal generated JSON file , any tips how to solve this?

Still even if development server will load up in 10 minutes which would be great I would still need to find way to display images from somewhere.

I have found this Gatsby issue which seems to be somehow related to my issue and it also seems that it was almost done? not sure if it would help me, but perhaps it could? unfortunately I do not know how to test it I guess I would need to wait for it to be part of the main gatsby branch.

#20741

In addition to trying to find a way to deal with images and I was reading this Lullabot post not sure if it is up to date but I'm running out of ideas :(

Link to module:
https://www.drupal.org/project/consumer_image_styles

Link to blogpost:
https://www.lullabot.com/articles/decoupled-drupal-hard-problems-image-styles

@smthomas
Copy link
Contributor

It looks like we have a few problems here. Downloading all of the images is taking a long time. The gatsby-source-drupal plugin is going to download these images as long as they are referenced in the JSON provided by JSON:API.

The issue you linked is one potential solution to help with that. I will follow up on that issue to see what is blocking it from getting merged but it appears it does require some additional work.

It looks like the commerce_api module will not work with jsonapi_extras. This is going to cause other problems down the line because the jsonapi_extras module is required if you want things like live preview or incremental builds to work. It might be worth following up in this thread to see what is causing to conflicts and if it is something commerce_api will be fixing in the future - https://www.drupal.org/project/commerce_api/issues/3121480

The idea of third party image hosting is a good one. There are options such as Cloudinary or you could host them on your Drupal site and potentially use the consumer_image_styles plugin. Other third party hosted CMS's such as Contentful and DatoCMS use this approach (which is why their build times are much faster). The images are never pulled down locally and the correct image is displayed at runtime based on the browser size. This would definitely speed up your builds but we would still need to figure out how to make sure the files were not downloaded (which is why the jsonapi_extras module would be helpful).

@LekoArts LekoArts added topic: source-drupal Related to Gatsby's integration with Drupal and removed status: needs core review Currently awaiting review from Core team member labels May 26, 2020
@ascorbic
Copy link
Contributor

ascorbic commented May 26, 2020

What I did in a similar situation was to rename file--file to something else with jsonapi_extras, which meant it still came through to Gatsby, but didn't download anything. On the Drupal side we set up s3fs for the images, and then linked the bucket to ImageKit. I then wrote a custom resolver to generate gastby-image compatible types, using ImageKit URLs generated from the s3 URLs coming from Drupal.

@KarolKier
Copy link
Author

KarolKier commented May 26, 2020

Thank you for feedback.

It is possible to prevent Gatsby from downloading images from "Commerce API" without changing the JSONAPI output (the load time is 30 minutes without images) by adding:

`    {
      resolve: `gatsby-source-drupal`,
      options: {
        baseUrl: `https://dev.domain.com`,
        apiBase: `/jsonapi/`, // optional, defaults to `jsonapi`
        concurrentFileRequests: 60, // optional, defaults to `20`
        disallowedLinkTypes: [`self`, `describedby`, `file--file`],
      },
    },`

(Unfortunately the href= for the images is no longer available so I can't even print out images from my website directly or trough 3rd party app. )

However it takes only 7 minutes to load gatsby develop if file--file resource is disabled as resource from "JSONAPI Extras" ( possibly it could be lowered even further using JSONAPI Boost (cache warmer that also doesn't work with Commerce API)

Incremental builds seems like must have in my case, but I tought that Commerce API is successor of Commerce Cart API (which seemed logical to use the newer API on new project)

The tricky part is to prevent gatsby downloading images (as option) when using plugin: gatsby-source-drupal but still have access to the "href" value in GraphQL for those images. I did tried to find part responsible for creating cache of images locally in the plugin, but I failed :(

Hosting images on my Drupal website and serving them trough Consumers Image styles + Lazyloading them with height placeholder in place seems like best option - Thank you

-- edit

@ascorbic - thank you so much for this great idea I tried to change the name of resource file--file using JSONAPI Extras unfortunately while trying to connect with Gatsby it return Error 500 and on domain.com /jsonapi (this error may be caused by my hosting.. but after reverting to original file--file error no longer appears

      "title": "Internal Server Error",
      "status": "500",
      "detail": "Route \"jsonapi.photos.collection\" does not exist.",

Could you also by any chance provide more informations about how you wrote this custom resolver?

@smthomas
Copy link
Contributor

As far as the 500 error. JSON:API does have some caching (especially if you are pulling the data through JSON:API as an anonymous user). Just to be sure, it might be worth making sure you clear the Drupal cache and run gatsby clean after changing the name of file--file. It might not fix it, but is worth testing if you haven't already.

@smthomas
Copy link
Contributor

One other option you might consider is trying gatsby-source-graphql with Drupal's GraphQL module. This will not download the images by default and might get you to the point where you can start pulling down content. This would still give you information about the image files that could then be used to display the images.

There are definitely some improvements needed to gatsby-source-drupal to work around some of these image scaling issues.

@apmsooner
Copy link
Contributor

apmsooner commented Jun 10, 2020

Something i did for custom entity with a multi-valued image field is to write a custom base field like so:

$fields['banner_images'] = BaseFieldDefinition::create('string')
      ->setLabel(t('Banners JSON'))
      ->setDescription(t('The JSON version of banner image files'))
      ->setComputed(TRUE)
      ->setClass(ComputedBanners::class);

And the class:

<?php

namespace Drupal\rlh_hotel\Plugin\Field;
use Drupal\Core\Field\FieldItemList;
use Drupal\Core\TypedData\ComputedItemListTrait;
use Drupal\Component\Serialization\Json;

class ComputedBanners extends FieldItemList {
  use ComputedItemListTrait;
  /**
   * {@inheritdoc}
   */
  protected function computeValue() {
    $entity = $this->getEntity();
    $banner_images = [];
    foreach ($entity->get('field_banner_images') as $banner_image) {
      $file = $banner_image->entity;
      $image = [
        'url' => $file->url(),
        'alt' => $banner_image->alt,
        'width' => $banner_image->width,
        'height' => $banner_image->height,
      ];
      $banner_images[] = $image;
    }
    $encoded = !empty($banner_images) ? Json::encode($banner_images) : null;
    $this->list[0] = $this->createItem(0, $encoded);
  }
}

Now this new field is available in the json api extras settings, and i set enhancer for that field to "JSON Field". The end result is i get a url + attributes to link to on the gatsby side. It doesn't get to use all the gatsby image goods cause the images stay stored on drupal site but if you have a ton of images, this might be an option for you. If you insist on downloading files, the ideas above are pretty good. I'd urge you to disable all the links you don't need in the json api settings and particularly the files--files one. You can use json api includes to pull the file with the entity you need and avoid pulling all files that may not be necessary. Still sounds like you might have too many files though as you described to download so referencing them remotely might be the way to go.

@pvdz
Copy link
Contributor

pvdz commented Jun 19, 2020

@apmsooner the changes to skip certain downloads has been merged now. Should help you a bit, hopefully :)

@LekoArts LekoArts closed this as completed Mar 1, 2021
@gatsbyjs gatsbyjs locked and limited conversation to collaborators Mar 1, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
topic: performance Related to runtime & build performance topic: source-drupal Related to Gatsby's integration with Drupal type: feature or enhancement Issue that is not a bug and requests the addition of a new feature or enhancement.
Projects
None yet
Development

No branches or pull requests

7 participants