Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Consider different method of persisting nodes data #6656

Closed
pieh opened this issue Jul 22, 2018 · 7 comments
Closed

[v2] Consider different method of persisting nodes data #6656

pieh opened this issue Jul 22, 2018 · 7 comments

Comments

@pieh
Copy link
Contributor

pieh commented Jul 22, 2018

Currently we serialize entire nodes map every time we save redux state to .cache/redux-state.json - this gets problematic when we hit scenarios with large amount of nodes - it might lead to high memory footprint and longer than necessary I/O when we save state.

Potential solution would be switching to sqlite ( f.e. https://github.com/JoshuaWise/better-sqlite3 ) or some noSQL document store which could handle incremental in more performant manner. We can listen to CREATE_NODE and DELETE_NODE actions and incrementally update persisted data without serializing entire nodes map. Because this would be used just for persistence - I think we really only need performant key-value (id->node) store that handles incremental updates nicely.

Initial list of things to consider:

  • performance comparisons
  • make sure that this won't cause problems with persistent data consistency - we would have 3 different methods of persisting data:
    • saving redux-state.json (same as now minus nodes data) that uses _.debounce (as of right now after bootstrap in gatsby develop)
    • nodes specific method
    • key-value cache store (saved to `.cache/cache/db.json)

Ref: #6611

@KyleAMathews
Copy link
Contributor

It'd be nice to make this pluggable. Most sites are fine with the existing serialization but a simple API could be built to allow for plugins to take over persisting changes.

It'd be nice too then to persist individual changes instead of the entire data tree each time as that's much faster obviously.

@VindulaF
Copy link

VindulaF commented Jul 24, 2018

It would be nice to have this in, I am building an prototype using Gatsby for my company's static website and we have a Contentful space with about ~9000 entries excluding assets and 8 active locales.

Thanks to #6611 I can build the website without facing OOM issues but however it takes far too long at source and transform nodes it eats up a lot of time in the build process even when building from cache. To be precise here are the numbers source and transform nodes — 365.245 s which is slow for a ~300 page static website.

Right now the time it takes is a deal breaker for us but I love the flexibility that Gatsby offers hoping to see this and would love to help in testing this PR against our static website.

@gatsbot
Copy link

gatsbot bot commented Jan 12, 2019

Old issues will be closed after 30 days of inactivity. This issue has been quiet for 20 days and is being marked as stale. Reply here or add the label "not stale" to keep this issue open!

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jan 12, 2019
@gatsbot
Copy link

gatsbot bot commented Jan 24, 2019

This issue is being closed due to inactivity. Is this a mistake? Please re-open this issue or create a new issue.

@gatsbot gatsbot bot closed this as completed Jan 24, 2019
@DSchau DSchau added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Mar 25, 2019
@DSchau
Copy link
Contributor

DSchau commented Mar 25, 2019

Let's re-open this--worth considering this again.

We have #10732, which may help, but worth considering if there are better approaches.

Some things I was thinking:

  • Persist separate parts of the redux state (and assemble on re-launch?)
    • @pieh's note of sqlite and listening to Redux would accomplish this
    • Would need to benchmark to see how much slower this is
    • Some slowdown is acceptable (e.g. 5-10%?) if it means Gatsby is scalable to sites of 50K+ pages easily
  • Stream data rather than keep it all in memory
  • Some combination of both of the above?

As @pieh notes, most important is that we don't introduce a significant performance regression in most sites for the benefits of few. However, we also need to make Gatsby more scalable to larger sits, and a key bottleneck currently is persisting large amounts of data in memory.

@DSchau DSchau reopened this Mar 25, 2019
@stefanprobst
Copy link
Contributor

@DSchau Two more things to note wrt to using LokiJS as a node store: (i) this will require breaking the nodes namespace out of redux anyway or at least make it easy to swap backends there; (ii) Loki has its own persistence adapters.

@pieh
Copy link
Contributor Author

pieh commented Apr 23, 2019

This should be mostly fixed (for now) with #10732

@pieh pieh closed this as completed Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants