perf(gatsby): do not call and iterate getAllNodes(File) for each file #28891

pvdz · 2021-01-06T14:26:41Z

Before it would call context.nodeModel.getAllNodes({ type: `File` }) (already a non-trivial operation on its own) and then filter that down to the one node it was looking for; the one that represents the file. This approach is not cached, super slow, and simply does not scale.

This PR changes that approach to leverage the fast filters, which will generate a O(1) lookup cache on the first call. The rest of the calls will be O(1) lookups.

Tested this on a modification of benchmarks/gabe-fs-markdown which adds one image per page (randomly generated before the benchmark).

At 1000 pages both finish in about 20s
At 16k pages, from 310s down to 195s
At 32k pages. from 1057s down to 382s
At 64k pages, from 3474s down to 754s
At 128k pages, from 5.5h down to 1652s

This effect scales as the number of nodes goes up. This should apply to most things that relate to local files, and maybe even beyond that.

This is a massive win for local file usage

just sayin'

…al file

… this is what we ended up with and I do not wish to discuss this any further

…ed set of arrays containing nodes, not a flat array. Then me and TS had another argument and we seem cool now.

pvdz · 2021-01-08T12:53:57Z

packages/gatsby/src/schema/resolvers.ts

+type nestedListOfStrings = Array<string | nestedListOfStrings>
+type nestedListOfNodes = Array<IGatsbyNode | nestedListOfNodes>


Funsies. But turns out the resulting nesting structure must match the input nesting structure. So if you have [a, [b, [c, d]]] as input, then your output should be the same except the input strings are replaced with nodes.

This is a tad annoying since it means we can't "just" pass on a flat array of promises to Promise.all() but so be it.

I realise this is already merged, but types should be PascalCased

sidharthachatterjee

You should change your middle name to Makes it faster

ascorbic · 2021-01-08T16:02:25Z

Awesome work, @pvdz

…gatsbyjs#28891)

perf(gatsby): do not call and iterate getAllNodes(File) for every loc…

6149a75

…al file

gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jan 6, 2021

LekoArts added topic: performance Related to runtime & build performance and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Jan 7, 2021

pvdz added 5 commits January 7, 2021 10:02

The fieldValue can be an array of arrays

692d820

prettier

cff0532

guess not

dbd2e6c

Merge branch 'master' into fasterlocalfiles

a1e24c5

TS and I had a good long chat and decided that we hate each other and…

adb2543

… this is what we ended up with and I do not wish to discuss this any further

pvdz mentioned this pull request Jan 7, 2021

fix(gatsby): rewrite a spread that would break at scale #28910

Merged

Test caught that the returned data structure is expected to be a nest…

0369e39

…ed set of arrays containing nodes, not a flat array. Then me and TS had another argument and we seem cool now.

pvdz commented Jan 8, 2021

View reviewed changes

sidharthachatterjee approved these changes Jan 8, 2021

View reviewed changes

sidharthachatterjee merged commit a455a23 into master Jan 8, 2021

sidharthachatterjee deleted the fasterlocalfiles branch January 8, 2021 15:46

pragmaticpat pushed a commit to pragmaticpat/gatsby that referenced this pull request Apr 28, 2022

perf(gatsby): do not call and iterate getAllNodes(File) for each file (…

2909e9a

…gatsbyjs#28891)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(gatsby): do not call and iterate getAllNodes(File) for each file #28891

perf(gatsby): do not call and iterate getAllNodes(File) for each file #28891

pvdz commented Jan 6, 2021 •

edited

Loading

pvdz Jan 8, 2021

ascorbic Jan 8, 2021

sidharthachatterjee left a comment

ascorbic commented Jan 8, 2021

		type nestedListOfStrings = Array<string \| nestedListOfStrings>
		type nestedListOfNodes = Array<IGatsbyNode \| nestedListOfNodes>

perf(gatsby): do not call and iterate getAllNodes(File) for each file #28891

perf(gatsby): do not call and iterate getAllNodes(File) for each file #28891

Conversation

pvdz commented Jan 6, 2021 • edited Loading

This is a massive win for local file usage

pvdz Jan 8, 2021

Choose a reason for hiding this comment

ascorbic Jan 8, 2021

Choose a reason for hiding this comment

sidharthachatterjee left a comment

Choose a reason for hiding this comment

ascorbic commented Jan 8, 2021

pvdz commented Jan 6, 2021 •

edited

Loading