Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gatsby): do not call and iterate getAllNodes(File) for each file #28891

Merged
merged 7 commits into from
Jan 8, 2021

Conversation

pvdz
Copy link
Contributor

@pvdz pvdz commented Jan 6, 2021

Before it would call context.nodeModel.getAllNodes({ type: `File` }) (already a non-trivial operation on its own) and then filter that down to the one node it was looking for; the one that represents the file. This approach is not cached, super slow, and simply does not scale.

This PR changes that approach to leverage the fast filters, which will generate a O(1) lookup cache on the first call. The rest of the calls will be O(1) lookups.

Tested this on a modification of benchmarks/gabe-fs-markdown which adds one image per page (randomly generated before the benchmark).

  • At 1000 pages both finish in about 20s
  • At 16k pages, from 310s down to 195s
  • At 32k pages. from 1057s down to 382s
  • At 64k pages, from 3474s down to 754s
  • At 128k pages, from 5.5h down to 1652s

This effect scales as the number of nodes goes up. This should apply to most things that relate to local files, and maybe even beyond that.

This is a massive win for local file usage

just sayin'

@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jan 6, 2021
@LekoArts LekoArts added topic: performance Related to runtime & build performance and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Jan 7, 2021
…ed set of arrays containing nodes, not a flat array. Then me and TS had another argument and we seem cool now.
Comment on lines +31 to +32
type nestedListOfStrings = Array<string | nestedListOfStrings>
type nestedListOfNodes = Array<IGatsbyNode | nestedListOfNodes>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funsies. But turns out the resulting nesting structure must match the input nesting structure. So if you have [a, [b, [c, d]]] as input, then your output should be the same except the input strings are replaced with nodes.

This is a tad annoying since it means we can't "just" pass on a flat array of promises to Promise.all() but so be it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this is already merged, but types should be PascalCased

Copy link
Contributor

@sidharthachatterjee sidharthachatterjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should change your middle name to Makes it faster

@sidharthachatterjee sidharthachatterjee merged commit a455a23 into master Jan 8, 2021
@sidharthachatterjee sidharthachatterjee deleted the fasterlocalfiles branch January 8, 2021 15:46
@ascorbic
Copy link
Contributor

ascorbic commented Jan 8, 2021

Awesome work, @pvdz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: performance Related to runtime & build performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants