-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Gatsby type inference is slow on 60k pages #12692
Comments
Thanks for the report and for providing a testing repo! We are indeed now checking every string if it is a date string -- we did not do this before but only checked on a randomly picked field value which unfortunately made type inference non-deterministic. We are aware though that this does not scale very well. I'll test with the provided repo soon -- in the meantime: if it hangs after |
Me too! I am using a single 60 MiB JSON file for my data. Was taking about 10-15 minutes to build with gatsby < 2.2.0 but with the most recent, it's taking 2h30. |
PR #12700 is a quickfix to make it a little bit faster again. Do you mind trying it out? |
Thanks @wardpeet I'm giving it a shot, hopefully with an answer in less than 2 hours :-) UPDATE: Looking good! |
What did schema creation take before? |
They mentioned in an earlier comment that it was taking 10-15 minutes. I'm going to try this on our codebase as we were taking 15 seconds before, but >30 minutes after. |
I also tried with swapping |
We will merge Ward's PR as a temporary optimization. Sadly As a long term solution to this, we will provide an opt-out from inference that will considerably speed up sites with lots of nodes. This will be done by specifying the GraphQL types for the nodes that you don't want to be inferred. We've added the "specifying the type" part already, but for historical reasons inference always happens. We will fix this in the next couple of weeks. |
## Description Checks if strings start with 4 numbers as all ISO 8601 date-time are using YYYY syntax ## Related Issues #12692
Closing (it's merged); further steps will require more specific issues. |
Just to throw an idea out there. What about if the source-plugins would be able to specify the data type when they create the nodes? Something like (as a proposal for a future improvement) |
@danechitoaie that's something we're going to move to. @millette @bennetthardwick I did some more perf improvements #12722 could you guys give it a spin and report if something is wonky |
@wardpeet I didn't have a chance to try #12722 but on first sight it looks like a lot of code compared to first patch. I'm not used to "multirepos" so I have to patch the dist module manually. Also, my project should be redesigned somewhat since it's already pretty slow and doesn't make the best usage of gatsby, so it shouldn't be too significant to proper gatsby usage. (reopening the issue since we're not done, it seems) |
@wardpeet I finally tried the new patch with gatsby 2.2.8. gatsby 2.2.8:
gatsby 2.2.8 with new patch:
|
@millette could you try with a gatsby@~2.1.0 version? The initial issue here was with the new Schema Customization API changes, so I'd be curious to see if it's still slower! Thanks! |
@millette the central issue here was with a regression with 2.2.0. Specifically, it seemed like the bottleneck was with Date inference. That change was released in gatsby@2.2.10 so if we want to compare performance, we would ideally test this change:
|
@DSchau Sorry in advance for the very long response, but here we go. I've included results while also running mplayer which took some cpu. If you compare with runs without mplayer, I concluded that the 2.2 branch takes a bit more cpu (vs IO) since it was more impacted by mplayer running. So there's a regression (540s vs 320s with mplayer; 485s vs 326s without mplayer). When 2.2 came out, the same build took almost 3 hours to complete so I'm not complaining :-) I didn't patch anything in these tests.
|
@millette not slow at all! Thank you for doing this--we appreciate it! |
FYI, I'm not generating 60k pages, but I'm using a 60 MiB JSON file as a source. It's built with https://github.com/millette/gatsby-starter-location-github and the source data is generated with https://github.com/millette/ghraphql. I need to put some time into it soon since it's getting rather slow and bulky: http://dev.rollodeqc.com/en/ but that's all on me. Thanks to all Gatsby contributors :-) |
Time to close this issue? |
The changes have fixed my original issue at least. Thanks! |
Description
After the
onPreExtractQueries
step of the build process, Gatsby gets when looking through 60k or so pages. Specifically, theisDate
method (which is run on every string to check whether it's a string or a date string), is taking up the most time. These methods are all being run from the example-value.js.Steps to reproduce
chrome://inspect
and start the debugger.onPreExtractQueries
, wait for a few minutes and then pause the debugger (chances are it will be in the right spot)The build will be stuck in this state for 30+ minutes (I haven't had a successful build).
Expected result
Build completes in a reasonable time.
Actual result
Build never completes.
Environment
System:
OS: Linux 4.15 Ubuntu 18.04.2 LTS (Bionic Beaver)
CPU: (4) x64 Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
Shell: 5.4.2 - /usr/bin/zsh
Binaries:
Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node
Yarn: 1.13.0 - ~/.nvm/versions/node/v10.15.0/bin/yarn
npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm
Languages:
Python: 2.7.15 - /usr/bin/python
npmPackages:
gatsby: ^2.2.1 => 2.2.1
gatsby-image: ^2.0.34 => 2.0.34
gatsby-plugin-manifest: ^2.0.24 => 2.0.24
gatsby-plugin-offline: ^2.0.25 => 2.0.25
gatsby-plugin-react-helmet: ^3.0.10 => 3.0.10
gatsby-plugin-sharp: ^2.0.29 => 2.0.29
gatsby-source-filesystem: ^2.0.27 => 2.0.27
gatsby-transformer-sharp: ^2.1.17 => 2.1.17
npmGlobalPackages:
gatsby-dev-cli: 2.4.12
gatsby: 2.2.2
The text was updated successfully, but these errors were encountered: