Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a hard limit on maximum number of pages that Gatsby can build? #20338

Closed
tsriram opened this issue Dec 30, 2019 · 23 comments
Closed

Is there a hard limit on maximum number of pages that Gatsby can build? #20338

tsriram opened this issue Dec 30, 2019 · 23 comments
Labels
type: question or discussion Issue discussing or asking a question about Gatsby

Comments

@tsriram
Copy link
Contributor

tsriram commented Dec 30, 2019

I'm trying to build a site with ~150k pages (probably more than this when I get closer to finishing) using Gatsby with a CSV file as data source. I initially had a sample dataset with about 100 rows in a CSV file and developed my initial pages and it worked. When I tried running gatsby build with all 150k rows, build was getting stuck in "source and transform nodes" step.

As suggested by @KyleAMathews, I split the large CSV into multiple files (varied number of rows based on data) and the build now finishes "source and transform nodes" in about 100s, but fails with heap out of memory error.

I also tried running the create pages benchmark site with 125k pages and it fails with the same error too, while it builds the site in less than 2 minutes for 100k pages.

I tried figuring out the underlying issue myself. From page creation docs, I reached pages reducer and found that we use JavaScript Map for the state.

I was wondering if there's a hard limit on the number of items that can be set in a Map. From this StackOverflow answer, it looks like we can set only upto 2^24 (roughly 167k) items in a Map. I'm not very sure about what else does this redux state have, but if it's storing only the pages, does ~167k become a hard limit for the number of pages that Gatsby can build?

There's a lot of places where we use Map in Gatsby source code. It's probably one of them causing this out of memory error?

@gatsbot gatsbot bot added the type: question or discussion Issue discussing or asking a question about Gatsby label Dec 30, 2019
@pvdz
Copy link
Contributor

pvdz commented Dec 30, 2019

Can you try increasing the allowed memory for nodejs?

The flag is --max-old-space-size=8192 and you can specify it either by a global flag or by calling gatsby through node explicitly

The first will apply the memory limit to all child processes (gatsby will spawn a few)

NODE_OPTIONS="--max-old-space-size=8192" gatsby build

The second will apply the memory limit only to the top level process

node --max-old-space-size=8192 node_modules/.bin/gatsby build

Assuming you have enough memory (my example is 8gb, adjust accordingly), does that work? :)

@pvdz
Copy link
Contributor

pvdz commented Dec 30, 2019

The limit suggested in that SO link is 16m, not 160k (2 ^ 24 = 16_777_216). I'm pretty sure the OOM happens because nodejs runs out of the default amount of memory (1gb). Which is not unexpected with that many pages (I'm a little surprised it doesn't oom at 100k, actually 😅 ).

@tsriram
Copy link
Contributor Author

tsriram commented Dec 30, 2019

Yeah, I can get the benchmark site work with --max-old-space-size=8192 and it creates 150k pages in less than 3 minutes.

But in my project it's very slow. It had created about 5700 pages in 11 minutes before I killed it. At this rate, it's gonna take more than 4 hours to create 150K pages. Is this expected? I guess the create pages benchmark is so fast as it doesn't have any data / query to run.

image

Also, where do you see the 2^28 for max size? It's mentioned in the first statement that 2^24 is the implementation-defined limit, no? What am I missing? 🤔

image

I'm not sure if this really affects gatsby build as I'm able to run the create pages benchmark even with 200k pages.

@tsriram
Copy link
Contributor Author

tsriram commented Dec 31, 2019

I tried running the markdown benchmark with 150k pages and it's slower too. "run queires" step processes about 7k queries in 10 minutes and it's probably going to take hours to finish building all pages.

image

@pvdz
Copy link
Contributor

pvdz commented Jan 1, 2020

Short answer: not really, certainly not for local data without images, but who knows.

Long answer: Ok. I'll have a look at it next week (most of us in core are on holiday last and this week, please bear with us). I'm sure we can sort this out.

In general there are four aspects at play for the perf;

  • overhead per page, noticable at scale
  • the graphql queries (in particular the query type inference and/or inefficient query design with many individual pages). One such problem noticable at scale in bootstrap stage
  • the sourcing (highly depends on speed of the source), and
  • image processing (always an expensive step even at linear growth)

In your case I'm guessing it's all about the graphql.

If I'm reading it right it took 16 minutes just to bootstrap.

  • If this is based on a local cvs file then I'd expect we can get huge wins with the sourcing step
  • The schema building step takes 14 minutes and I would bet this is type inference. Type inference is probably the biggest culprit here. I think we have docs that explain how to use typed schemas which disables the graphql query inference. Even without (or additionally), we can probably improve the situation if we analyze how the query data is passed on to each page. It wouldn't surprise me if we can eliminate 95% of this time :)
  • After that we can look at the run query step because for a local csv that much time is weird to me :)

That said, you did surface an interesting future problem with the Map limitation. I still don't expect you to be hitting that (but who knows), regardless we'll be needing to take that into account as we're going to scale up. So thank you for that.

Is there any way for me to access this repro so we can investigate? Private or anonymized / generated data is fine for me. I mostly need the real world scale.

@pvdz
Copy link
Contributor

pvdz commented Jan 1, 2020

(Sorry, 2^24=16m, 2^28 is 268m :p Not sure if typo or whatever)

@tsriram
Copy link
Contributor Author

tsriram commented Jan 3, 2020

Thank you @pvdz for the detailed response. I know I'm not great at math, but wouldn't have imagined mistaking 16m for 160k 🤦‍♂ Sorry about that 😛

I tried doing schema customization, but was getting an error with that. Not sure if using gatsby-plugin-schema-snapshot helps in this regard (I use it).

So the dataset that I'm using is Indian Financial System Code (used in banks in India for money transfer etc.) and it's a public dataset. You can find my repo here -- https://github.com/tsriram/ifsc/.

I'm wondering if it's actually a good idea to build this as a static website or I could have a client only routes in Gatsby & build an API to get the data 🤔

@pvdz
Copy link
Contributor

pvdz commented Jan 7, 2020

I can repro. The run queries step is dog slow. It was already going to be my first point of investigation, this just confirms it even more so :) The speed of queries slows down the more pages we run so something's going on that shouldn't be affected by absolute page count.

Thanks for the repro, very helpful! One immediate tip to speed things up: there's an implicit index on id, not on slug. You'll find a significant improvement if you query by id. (Your total build time will still be huge, I'm working on it! 😅 )

Another is that the inference may still be running. That's an unintended bug we're currently already investigating. So if that's the case a fix may also improve your build times.

@pvdz
Copy link
Contributor

pvdz commented Jan 7, 2020

Fwiw, it's about 4 hours and 45 minutes :p On my benchmark machine, anyways. I'm looking into details. I'm convinced the time for run queries can be improved. It should not scale down as the number of pages scales up and at 150k pages there are 10 queries/second while at 2k pages it's about 150 queries/second. So that's very fishy :p

Anyways, just wanted to let you know I'm looking into it. Will report back soon.

info Deleting .cache, public
info Successfully deleted directories
success open and validate gatsby-configs - 0.022s
success load plugins - 0.044s
success onPreInit - 0.002s
success delete html and css files from previous builds - 0.009s
success initialize cache - 0.006s
success copy gatsby files - 0.017s
success onPreBootstrap - 0.010s
info Reading GraphQL type definitions from /home/peter/gatsby/schema.gql
success createSchemaCustomization - 0.008s
success source and transform nodes - 132.358s
success building schema - 104.801s
success createPages - 99.838s
success createPagesStatefully - 0.598s
success onPreExtractQueries - 0.001s
success update schema - 1.369s
success extract queries from components - 0.447s
success write out requires - 0.944s
success write out redirect data - 0.002s
success onPostBootstrap - 0.001s
⠀
info bootstrap finished - 343.548 s
⠀
success Building production JavaScript and CSS bundles - 13.122s
success Rewriting compilation hashes - 0.004s
success run queries - 15457.460s - 145178/145178 9.39/s
success Building static HTML for pages - 113.692s - 145178/145178 1276.94/s
info Done building in 15919.447 sec

@tsriram
Copy link
Contributor Author

tsriram commented Jan 8, 2020

Wow, thanks much @pvdz 👍

Looking forward to what you come up with :)

@pvdz
Copy link
Contributor

pvdz commented Jan 9, 2020

Still working on this. I'm very convinced fixing this bottleneck will have a huge impact on any large site.

  • Ruled out disk i/o (basically very little going on there until the "run queries" step, which writes to disk for caching)
  • Ruled out node's scheduler itself (can schedule 1.5M promises no sweat), which doesn't mean there's not a problem with scheduling, just that it is able to cope with such loads (as expected)
  • We can generate 200k pages in under 5 minutes without graphql, meaning a large chunk of the infra works/scales as intended
  • So now we're eyeing the graphql part, which can still be anything (and still relate to how it's being queued)

@pvdz
Copy link
Contributor

pvdz commented Jan 14, 2020

You might think I have forgotten about this.

But you'd be wrong.

And happy.

Debugging the problem in this build turned out to be a deep rabbit hole and it took me some time to get in, and out of it. But, happy to report I can build your site in ~10 minutes now.

success Building production JavaScript and CSS bundles - 6.687s
success Rewriting compilation hashes - 0.007s
success run queries - 239.751s - 145178/145178 605.54/s
success Building static HTML for pages - 113.521s - 145178/145178 1278.87/s
info Done building in 654.383649061 sec

You'll have to wait a bit before you can do this but there are some fixes / workarounds upcoming.

The basic gist is that the way nodes are looked up have a shortcut for querying by id. Unfortunately this heuristic is not optimal and fails to hit the mark in your case. That led to a bunch of other things and will need to be fixed on Gatsby's side.

After that, the run queries step drops to ~10 minutes (down from 257 minutes, or 4.2 hours, as you can see above). Which makes me very happy :d

The wait for you is now for me to polish this fix, make sure the generic assumptions hold (is your site a one-of or are most sites like yours?) and then we should be good to go.

@tsriram
Copy link
Contributor Author

tsriram commented Jan 15, 2020

No, I was pretty sure you'd be working on this :)

Improvement in the run query step sounds like a great success.

I actually wanted to spend some time on this and try figure out what's going on, but you could only do so much with a toddler 👶 around.

Looking forward to the fix. Thank you again, @pvdz :)

@pvdz
Copy link
Contributor

pvdz commented Jan 15, 2020

Now #20609 has landed in master. This is the part from us you'll need to see improvements. (Still needs to be published so if you're not comfortable to build from source it usually doesn't take long to get published).

The other change is to your repo. It's changing the index from slug to id:

src/templates/ifsc.tsx:

export const query = graphql`
-  query($slug: String!) {
-    allIfscCsv(filter: { fields: { slug: { eq: $slug } } }) {
+  query($id: String!) {
+    allIfscCsv(filter: { id: { eq: $id } }) {
       edges {
         node {
           ifsc

gatsby-node.js

const result = await graphql(`
     query {
       allIfscCsv {
         edges {
           node {
+            id
             fields {
               slug
             }

and later in that file

   result.data.allIfscCsv.edges.forEach(({ node }) => {
     createPage({
       path: node.fields.slug,
       component: path.resolve(`./src/templates/ifsc.tsx`),
       context: {
-        slug: node.fields.slug
+        id: node.id,
       }
     });
   });

I think that should suffice.

With that, the run queries step should take roughly 5 minutes on Gatsby master.


If you want counting stats while building for your pages (hey that's 60 seconds less of looking at an idle screen) you can copy paste my whole change, which will use a progress bar for the createPages step (this is gatsby-config again);

-exports.createPages = async ({ graphql, actions }) => {
+exports.createPages = async ({ graphql, actions, reporter }) => {
+  const progress = reporter.createProgress(`ifsc/gatsby-node.js`);
+  console.time("(ifsc) total exports.createPages");
+  console.time("(ifsc) initial graphql query");
+  progress.setStatus("initial graphl query");
+
   const { createPage } = actions;
   const result = await graphql(`
     query {
       allIfscCsv {
         edges {
           node {
+            id  
             fields {
               slug
             }
@@ -36,13 +42,38 @@ exports.createPages = async ({ graphql, actions }) => {
       }
     }
   `);
+  console.timeEnd("(ifsc) initial graphql query");
+
+  console.time("(ifsc) created pages");
+
+  progress.start();
+  progress.total = result.data.allIfscCsv.edges.length - 1;
+  let start = Date.now();
+  progress.setStatus(
+    "Calling createPage for " + result.data.allIfscCsv.edges.length + " pages"
+  );
   result.data.allIfscCsv.edges.forEach(({ node }) => {
     createPage({
       path: node.fields.slug,
       component: path.resolve(`./src/templates/ifsc.tsx`),
       context: {
-        slug: node.fields.slug
+        id: node.id,
+        // slug: node.fields.slug
       }
     });
+    progress.tick(1);
   });
+  progress.setStatus(
+    "Called createPage for " +
+      (result.data.allIfscCsv.edges.length - 1) +
+      " pages at " +
+      (result.data.allIfscCsv.edges.length - 1) /
+        ((Date.now() - start) / 1000) +
+      " pages/s"
+  );
+  progress.done();
+  console.timeEnd("(ifsc) created pages");
+  console.timeEnd("(ifsc) total exports.createPages");
+  progress.setStatus("createPages finished");
 };

@pvdz
Copy link
Contributor

pvdz commented Jan 15, 2020

Note that the createPages step runs at roughly 4k page/s here. The "run queries" was running at 10 q/s before. Changing it to id (properly) changed that to roughly 70 q/s. Applying a shortcut in Gatsby raised that to roughly 600~800 q/s.

It's all about loops :) And these show themselves pretty quickly at scale.

Thank you for your example!

@tsriram
Copy link
Contributor Author

tsriram commented Jan 17, 2020

This is awesome, @pvdz 🎉

I just upgrade the Gatsby version locally and it started building very fast, but crashed after about 70% with out of memory error. Now if I run the build with NODE_OPTIONS="--max-old-space-size=8192", it gets stuck at building schema step and just not going forward 😞

It's the same result every time I run build -- fails at ~70% with default memory and gets stuck in building schema with --max-old-space-size=8192. There's no other change in my repo except for what you've suggested. I'll try to figure out what's going on.

I'll close this issue as you've fixed the build time. Thank you again for your help 👍

@tsriram tsriram closed this as completed Jan 17, 2020
@pvdz
Copy link
Contributor

pvdz commented Jan 17, 2020

Hm ok. Fwiw the build completes here (indeed with increased memory). Node v8.16.2, I don't think there's anything else of significance to the env right now.

If you can't push it forward please post a new issue about it :) Although perhaps waiting for an actual release with this fix, perhaps that helps. Dunno.

@tsriram
Copy link
Contributor Author

tsriram commented Jan 17, 2020

Okay, for the curious, I ran build with --max-old-space-size=4096. It gets past the build schema step and fails with this error when running queries:

UNHANDLED REJECTION The argument 'path' must be a string or Uint8Array without null bytes. Received '/Users/sriram/code/personal/ifsc/public/page-data/\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u000...



  Error: TypeError [ERR_INVALID_ARG_VALUE]: The argument 'path' must be a string or Uint8Array without null bytes. Received '/Users/sriram/code/personal/ifsc/public/page-data/\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\  u0000\u0000\u000...

This is weird 🙄

@pvdz
Copy link
Contributor

pvdz commented Jan 17, 2020

FYI, the gift that keeps on giving.

It seems #20477 by @vladar adds another 66% speed boost to your site :D (or at least the "run queries" step).

@vladar
Copy link
Contributor

vladar commented Jan 17, 2020

@tsriram You should sanitize path here: https://github.com/tsriram/ifsc/blob/master/gatsby-node.js#L58

It is highly likely that your data contains characters that are not allowed in a file name

@pvdz
Copy link
Contributor

pvdz commented Jan 17, 2020

ftr (more for ourselves than anything else); currently this is how long a build took me after the last changes, on a non-stabalized machine (I usually run these on a stable benchmarking machine):

 😶   19:49:43 (master) ~/ifsc $ $(which node) node_modules/.bin/gatsby clean; $(which node) --max_old_space_size=8000 $(which gatsby) build
warn Node.js v8.16.2 has reached End of Life status on 31 December, 2019.
Gatsby will only actively support 10.13.0 or higher and drop support for Node 8 soon.
Please upgrade Node.js to a currently active LTS release: https://gatsby.dev/upgrading-node-js
info Deleting .cache, public
info Successfully deleted directories
success open and validate gatsby-configs - 0.029s
success load plugins - 0.080s
success onPreInit - 0.002s
success delete html and css files from previous builds - 0.011s
success initialize cache - 0.009s
success copy gatsby files - 0.018s
success onPreBootstrap - 0.012s
info Reading GraphQL type definitions from ~/ifsc/schema.gql
success createSchemaCustomization - 0.009s
success source and transform nodes - 185.498s
success building schema - 0.836s
(ifsc) initial graphql query: 2060.071ms
success ifsc/gatsby-node.js - 79.146s - Called createPage for 148150 pages at 1872.1645836755843 pages/s
(ifsc) created pages: 79293.081ms
(ifsc) total exports.createPages: 81358.909ms
success createPages - 125.695s
success createPagesStatefully - 0.799s
success onPreExtractQueries - 0.001s
success update schema - 1.888s
success extract queries from components - 0.327s
success write out requires - 1.129s
success write out redirect data - 0.005s
success onPostBootstrap - 0.001s
⠀
info bootstrap finished - 320.976 s
⠀
success Building production JavaScript and CSS bundles - 16.812s
success Rewriting compilation hashes - 0.004s
success run queries - 182.094s - 145178/145178 797.27/s
success Building static HTML for pages - 123.662s - 145178/145178 1173.99/s
info Done building in 634.206 sec

@KyleAMathews
Copy link
Contributor

I wanted to try out the new hotness and with a bit of fiddling got it working too. It built in 316 seconds on my 16 inch macbook pro 🔥

One thing we could fix here is I had to manually truncate overly long paths to successfully write them to file (macs only allow 255 characters in file paths). We should just do that automatically. I thought I'd made an issue for this before but I'll make a new one.

success open and validate gatsby-configs - 0.022s
success load plugins - 0.113s
success onPreInit - 0.003s
success delete html and css files from previous builds - 0.011s
success initialize cache - 0.008s
success copy gatsby files - 0.052s
success onPreBootstrap - 0.016s
info Reading GraphQL type definitions from /private/tmp/ifsc/schema.gql
success createSchemaCustomization - 0.012s
old slug ANDHRA-PRAGATHI-GRAMEENA-BANK/ANDHRA-PRADESH/KURNOOL/NANDYALbox-drawings-light-down-and-horizontala-NOONEPALLE-BRANCHbox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontala-branch
new slug ANDHRA-PRAGATHI-GRAMEENA-BANK/ANDHRA-PRADESH/KURNOOL/NANDYALbox-drawings-light-down-and-horizontala-2583655316
old slug CORPORATION-BANK/ANDHRA-PRADESH/MEDAK/BHANURbox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontala-branch
new slug CORPORATION-BANK/ANDHRA-PRADESH/MEDAK/BHANURbox-drawings-light-down-and-horizontalabox-drawings-ligh1133251587
old slug CORPORATION-BANK/KARNATAKA/SHIMOGA/SHIMOGA-SAVALANGA-ROADbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singlebox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singlebox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singlebox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singlebox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singlebox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singlebox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singlebox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-single-branch
new slug CORPORATION-BANK/KARNATAKA/SHIMOGA/SHIMOGA-SAVALANGA-ROADbox-drawings-light-vertical-and-rightbox-dr2224335678
old slug CORPORATION-BANK/UTTAR-PRADESH/BALLIAbox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontala/BALLIA-branch
new slug CORPORATION-BANK/UTTAR-PRADESH/BALLIAbox-drawings-light-down-and-horizontalabox-drawings-light-down-3764137266
old slug AXIS-BANK/UTTARAKHAND/DEHRADUN/GARHIbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singleCANTTbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singleDEHRADUN-branch
new slug AXIS-BANK/UTTARAKHAND/DEHRADUN/GARHIbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-1189187145
old slug AXIS-BANK/UTTARAKHAND/DEHRADUN/PREMbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singleNAGARbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singleDEHRADUN-branch
new slug AXIS-BANK/UTTARAKHAND/DEHRADUN/PREMbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-l1250329842
old slug AXIS-BANK/UTTARAKHAND/DEHRADUN/SAHASTRADHARAbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singleROADbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singleDEHRADUN-branch
new slug AXIS-BANK/UTTARAKHAND/DEHRADUN/SAHASTRADHARAbox-drawings-light-vertical-and-rightbox-drawings-up-dou3459701885
old slug STATE-BANK-OF-INDIA/JHARKHAND/PAKUR/AMRAPARAbox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontala-branch
new slug STATE-BANK-OF-INDIA/JHARKHAND/PAKUR/AMRAPARAbox-drawings-light-down-and-horizontalabox-drawings-ligh2275783056
old slug STATE-BANK-OF-INDIA/BIHAR/KATIHAR/KURSELAbox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontala-branch
new slug STATE-BANK-OF-INDIA/BIHAR/KATIHAR/KURSELAbox-drawings-light-down-and-horizontalabox-drawings-light-d3957549118
old slug STATE-BANK-OF-INDIA/BIHAR/EAST-CHAMPARAN/MEHSIbox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontala-branch
new slug STATE-BANK-OF-INDIA/BIHAR/EAST-CHAMPARAN/MEHSIbox-drawings-light-down-and-horizontalabox-drawings-li1455361292
old slug STATE-BANK-OF-INDIA/GUJARAT/RAJKOT/SME-box-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-singleBRANCH-JASDANbox-drawings-light-vertical-and-rightbox-drawings-up-double-and-left-single-branch
new slug STATE-BANK-OF-INDIA/GUJARAT/RAJKOT/SME-box-drawings-light-vertical-and-rightbox-drawings-up-double-a1093670573
old slug STATE-BANK-OF-INDIA/MADHYA-PRADESH/BHOPAL/BAIRAGARH-BHOPALbox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontalabox-drawings-light-down-and-horizontala-branch
new slug STATE-BANK-OF-INDIA/MADHYA-PRADESH/BHOPAL/BAIRAGARH-BHOPALbox-drawings-light-down-and-horizontalabox2646973818
success source and transform nodes - 108.664s
success building schema - 0.422s
(ifsc) initial graphql query: 1332.955ms
success ifsc/gatsby-node.js - 34.247s - Called createPage for 131076 pages at 3828.377825807582 pages/s
(ifsc) created pages: 34351.537ms
(ifsc) total exports.createPages: 35688.693ms
success createPages - 62.149s
success createPagesStatefully - 0.285s
success onPreExtractQueries - 0.003s
success update schema - 0.526s
success extract queries from components - 0.128s
success write out requires - 0.465s
success write out redirect data - 0.002s
success onPostBootstrap - 0.001s
⠀
info bootstrap finished - 177.112 s
⠀
success Building production JavaScript and CSS bundles - 5.094s
success Rewriting compilation hashes - 0.004s
success run queries - 88.966s - 128081/128081 1439.67/s
success Building static HTML for pages - 46.244s - 128081/128081 2769.71/s
info Done building in 315.893595137 sec

My diff

diff --git a/gatsby-node.js b/gatsby-node.js
index 2aa38d1..e0d26c3 100644
--- a/gatsby-node.js
+++ b/gatsby-node.js
@@ -1,5 +1,6 @@
-const slugify = text => text.replace(/ /g, "-").toLowerCase();
+const slugify = require(`slug`);
 const path = require("path");
+const strhash = require(`string-hash`);
 
 exports.onCreateNode = ({ node, actions }) => {
   if (node.internal.type === "IfscCsv") {
@@ -11,12 +12,18 @@ exports.onCreateNode = ({ node, actions }) => {
     const citySlug = slugify(city);
     const branchSlug = slugify(branch);
 
-    const slug = `${bankSlug}/${stateSlug}/${citySlug}/${branchSlug}-branch`;
+    let slug = `${bankSlug}/${stateSlug}/${citySlug}/${branchSlug}-branch`;
+
+    if (slug.length > 200) {
+      console.log(`old slug`, slug);
+      slug = slug.slice(0, 100) + strhash(slug.slice(100));
+      console.log(`new slug`, slug);
+    }
 
     createNodeField({
       node,
       name: `slug`,
-      value: slug
+      value: slug,
     });
   }
 };
@@ -58,8 +65,8 @@ exports.createPages = async ({ graphql, actions, reporter }) => {
       path: node.fields.slug,
       component: path.resolve(`./src/templates/ifsc.tsx`),
       context: {
-        id: node.id
-      }
+        id: node.id,
+      },
     });
     progress.tick(1);
   });
diff --git a/package.json b/package.json
index 50601e8..88c18bd 100644
--- a/package.json
+++ b/package.json
@@ -25,7 +25,9 @@
     "gatsby-transformer-csv": "^2.1.21",
     "node-sass": "^4.13.0",
     "react": "^16.12.0",
-    "react-dom": "^16.12.0"
+    "react-dom": "^16.12.0",
+    "slug": "^2.1.0",
+    "string-hash": "^1.1.3"
   },
   "devDependencies": {
     "@aleung/csvsplit": "^2.0.0",

@muescha
Copy link
Contributor

muescha commented Jan 19, 2020

macs only allow 255 characters in file paths

Just note: There is only a limit of 255 for each filename. But the path has no limit

see https://en.wikipedia.org/wiki/Comparison_of_file_systems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question or discussion Issue discussing or asking a question about Gatsby
Projects
None yet
Development

No branches or pull requests

5 participants