Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby): Page build optimisations for incremental data changes #21523

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
8937307
store page data between builds
StuartRayson Feb 5, 2020
13b4071
Support removing of pages from public
StuartRayson Feb 5, 2020
6736205
build all pages if webpackCompilationHash has changed
StuartRayson Feb 5, 2020
bfd281e
Support deleting of data on processors that run once
StuartRayson Feb 5, 2020
30e041d
Add expirement flag to page performance
StuartRayson Feb 5, 2020
ad83b98
Add expirement flag to page performance
StuartRayson Feb 5, 2020
a47aa30
Add comments and types to actions
StuartRayson Feb 6, 2020
d9aaac4
Add missing page data reducer
StuartRayson Feb 6, 2020
635bbe4
Retain public between builds
StuartRayson Feb 6, 2020
3555004
Add docs to page build time enhancement
StuartRayson Feb 14, 2020
326d412
Update Page build optimisations docs
dominicfallows Feb 17, 2020
584ca37
Fix build.js conflict
StuartRayson Feb 17, 2020
3d595a4
Merge branch 'improve-page-build-on-data-change' of github.com:intera…
StuartRayson Feb 17, 2020
e9b8492
initial refactor from code review
StuartRayson Feb 17, 2020
d4d8a67
Use hash instead of whole page context
StuartRayson Feb 17, 2020
4af1936
Use hash instead of whole page context
StuartRayson Feb 17, 2020
ca51945
Remove page data in dev action deleteComponentsDependencies
StuartRayson Feb 17, 2020
967596d
Pass cache page data to processQueries function
StuartRayson Feb 17, 2020
46ff752
Remove added pageData check in component-data-dependencies.js
StuartRayson Feb 17, 2020
7060d8d
Remove pagedata if page removed in page-hot-reloader
StuartRayson Feb 17, 2020
a9b2b68
Update snapshots
StuartRayson Feb 18, 2020
697634e
Revert "Update snapshots"
StuartRayson Feb 18, 2020
01b8283
Update reducer context
StuartRayson Feb 18, 2020
83fd4fa
Update docs/docs/page-build-optimizations-for-incremental-data-change…
dominicfallows Feb 18, 2020
ce2202d
Update docs/docs/page-build-optimizations-for-incremental-data-change…
dominicfallows Feb 18, 2020
e78c9a6
Update docs/docs/page-build-optimizations-for-incremental-data-change…
dominicfallows Feb 18, 2020
ab4f93b
Merge branch 'improve-page-build-on-data-change' of github.com:intera…
dominicfallows Feb 18, 2020
eb9b9f8
Update docs/docs/page-build-optimizations-for-incremental-data-change…
dominicfallows Feb 18, 2020
623a3ad
Update docs/docs/page-build-optimizations-for-incremental-data-change…
dominicfallows Feb 18, 2020
6a73b97
Update docs/docs/page-build-optimizations-for-incremental-data-change…
dominicfallows Feb 18, 2020
d0ad8ee
Fix doc duplication error
dominicfallows Feb 18, 2020
d379cd8
Update docs
dominicfallows Feb 19, 2020
fdc65a3
Update docs/docs/page-build-optimizations-for-incremental-data-change…
dominicfallows Feb 20, 2020
6056bf4
Update www/src/data/sidebars/doc-links.yaml
dominicfallows Feb 20, 2020
637084f
refector improvements
StuartRayson Feb 20, 2020
cbdff6a
Merge branch 'improve-page-build-on-data-change' of github.com:intera…
StuartRayson Feb 20, 2020
6053297
improve delete to use promise all
StuartRayson Feb 20, 2020
4ce5c27
Block develop mode if experimental flag is used
StuartRayson Feb 20, 2020
0029ae1
Update docs with new flag name GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA…
StuartRayson Feb 20, 2020
6f08e53
Improvements remove pages logic
StuartRayson Feb 21, 2020
65efb64
Remove empty directory if no files
StuartRayson Feb 21, 2020
9c95671
Refactor pagePath reassign in build
StuartRayson Feb 22, 2020
0f57c47
initial attempt at removing nested folders in the correct order
StuartRayson Feb 23, 2020
300d49c
refactor delete public html and data function
StuartRayson Feb 23, 2020
0599b8f
use join in render-html.js
StuartRayson Feb 23, 2020
fa11ff6
Renaming functions and refactoring
StuartRayson Feb 24, 2020
e04bcd8
Add remove functions to page util
StuartRayson Feb 25, 2020
ded3ccc
Remove whitespace from page-data
StuartRayson Feb 25, 2020
c6f5d1c
Move new build functions to build-utils.js
StuartRayson Feb 27, 2020
2d60d5e
check html suffix function
StuartRayson Feb 28, 2020
ec61132
update docs
StuartRayson Feb 28, 2020
fef05f5
handle .html paths when removing empty directories
StuartRayson Feb 28, 2020
a31749c
Apply suggestions from code review
dominicfallows Feb 28, 2020
09cc305
fix formating issue on docs
StuartRayson Feb 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/docs/overview-of-the-gatsby-build-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,8 @@ Page queries that were queued up earlier from query extraction are run so the da

With everything ready for the HTML pages in place, HTML is compiled and written out to files so it can be served up statically. Since HTML is being produced in a Node.js server context, [references to browser APIs like `window` can break the build](/docs/debugging-html-builds/) and must be conditionally applied.

By default, Gatsby rebuilds static HTML for all pages on each build. There is an experimental feature flag `GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES` which enables [Page Build Optimizations for Incremental Data Changes](/docs/page-build-optimizations-for-incremental-data-changes/).

## What do you get from a successful build?

When a Gatsby build is successfully completed, everything you need to deploy your site ends up in the `public` folder at the root of the site. The build includes minified files, transformed images, JSON files with information and data for each page, static HTML for each page, and more.
Expand Down
63 changes: 63 additions & 0 deletions docs/docs/page-build-optimizations-for-incremental-data-changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Experimental Page Build Optimizations for Incremental Data Changes
---

Building sites with large amounts of content (10,000s nodes upwards) is relatively fast with Gatsby. However, some projects might start to experience issues when adopting CI/CD principles - continuously building and deploying. Gatsby rebuilds the complete app on each `gatsby build` which means the complete app also needs to be deployed. Doing this each time a small data change occurs unnecessarily increases demand on CPU, memory, and bandwidth.

One solution to these problems might be to use [Gatsby Cloud's Build features](https://www.gatsbyjs.com/cloud/).

For projects that require self-hosted environments, where Gatsby Cloud would not be an option, deploying only the content that has changed or is new (incremental data changes, you might say) can help reduce build times, deployment times and demand on resources.

For more info on the standard build process please see [overview of the gatsby build process](/docs/overview-of-the-gatsby-build-process/)

## How to use

To enable this enhancement, use the environment variable `GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true` in your `gatsby build` command, for example:

`GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true gatsby build --log-pages`

This will run the Gatsby build process, but only build pages that have data changes since your last build. If there are any changes to code (JS, CSS) the bundling process returns a new webpack compilation hash which causes all pages to be rebuilt.

### Reporting what has been built

You may want to retrieve a list of the pages that were built. For example, if you want to perform a sync action in your CI/CD pipeline.

To list the paths in the build assets (`public`) folder, you can use one (or both) of the following arguments in your `build` command.

- `--log-pages` parameter will output all the file paths that were updated or deleted at the end of the build stage.

```bash
success Building production JavaScript and CSS bundles - 82.198s
success run queries - 82.762s - 4/4 0.05/s
success Building static HTML for pages - 19.386s - 2/2 0.10/s
+ success Delete previous page data - 1.512s
info Done building in 152.084 sec
+ info Built pages:
+ Updated page: /about
+ Updated page: /accounts/example
+ info Deleted pages:
+ Deleted page: /test

Done in 154.501 sec
```

- `--write-to-file` creates two files in the `.cache` folder, with lists of the changed paths in the build assets (`public`) folder.

- `newPages.txt` will contain a list of new or changed paths
- `deletedPages.txt` will contain a list of deleted paths

If there are no changed or deleted paths, then the relevant files will not be created in the `.cache` folder.

## More information

- This enhancement works by comparing the page data from the previous build to the new page data. This creates a list of page directories that are passed to the static build process.

- To enable this build option you will need to set an environment variable, which requires access to do so in your build environment.

- This feature is not available with `gatsby develop`.

* At the end of each build, gatsby creates a `redux.state` file in `/.cache` that contains previous build data. You will need to persist the `.cache/redux.state` between builds, allowing for comparison. If there is no `redux.state` file located in the `/.cache` folder then a full build will be triggered.

* Any code or static query changes (templates, components, source handling, new plugins etc) will prompt the creation of a new webpack compilation hash and trigger a full build.

Note: When using the `GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES` flag it is important to do so consistently when building your project. Otherwise, the cache will be cleared and the necessary data for comparison will no longer be available, removing the ability to check for incremental data changes.
6 changes: 5 additions & 1 deletion packages/gatsby/src/bootstrap/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,10 @@ module.exports = async (args: BootstrapArgs) => {

// During builds, delete html and css files from the public directory as we don't want
// deleted pages and styles from previous builds to stick around.
if (process.env.NODE_ENV === `production`) {
if (
!process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES &&
process.env.NODE_ENV === `production`
) {
activity = report.activityTimer(
`delete html and css files from previous builds`,
{
Expand Down Expand Up @@ -221,6 +224,7 @@ module.exports = async (args: BootstrapArgs) => {
// logic in there e.g. generating slugs for custom pages.
const pluginVersions = flattenedPlugins.map(p => p.version)
const hashes = await Promise.all([
!!process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES,
md5File(`package.json`),
Promise.resolve(
md5File(`${program.directory}/gatsby-config.js`).catch(() => {})
Expand Down
90 changes: 90 additions & 0 deletions packages/gatsby/src/commands/build-utils.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
const fs = require(`fs-extra`)
const path = require(`path`)
const {
remove: removePageHtmlFile,
getPageHtmlFilePath,
} = require(`../utils/page-html`)
const {
remove: removePageDataFile,
fixedPagePath,
} = require(`../utils/page-data`)

const getChangedPageDataKeys = (state, cachedPageData) => {
if (cachedPageData && state.pageData) {
const pageKeys = []
state.pageData.forEach((newPageDataHash, key) => {
if (!cachedPageData.has(key)) {
pageKeys.push(key)
} else {
const previousPageDataHash = cachedPageData.get(key)
if (newPageDataHash !== previousPageDataHash) {
pageKeys.push(key)
}
}
})
return pageKeys
}

return [...state.pages.keys()]
}

const collectRemovedPageData = (state, cachedPageData) => {
if (cachedPageData && state.pageData) {
const deletedPageKeys = []
cachedPageData.forEach((_value, key) => {
if (!state.pageData.has(key)) {
deletedPageKeys.push(key)
}
})
return deletedPageKeys
}
return []
}

const checkAndRemoveEmptyDir = (publicDir, pagePath) => {
const pageHtmlDirectory = path.dirname(
getPageHtmlFilePath(publicDir, pagePath)
)
const pageDataDirectory = path.join(
publicDir,
`page-data`,
fixedPagePath(pagePath)
)
const hasFiles = fs.readdirSync(pageHtmlDirectory)

// if page's html folder is empty also remove matching page-data folder
if (!hasFiles.length) {
fs.removeSync(pageHtmlDirectory)
fs.removeSync(pageDataDirectory)
}
}

const sortedPageKeysByNestedLevel = pageKeys =>
pageKeys.sort((a, b) => {
const currentPagePathValue = a.split(`/`).length
const previousPagePathValue = b.split(`/`).length
return previousPagePathValue - currentPagePathValue
})

const removePageFiles = ({ publicDir }, pageKeys) => {
const removePages = pageKeys.map(pagePath =>
removePageHtmlFile({ publicDir }, pagePath)
)

const removePageData = pageKeys.map(pagePath =>
removePageDataFile({ publicDir }, pagePath)
)

return Promise.all([...removePages, ...removePageData]).then(() => {
// Sort removed pageKeys by nested directories and remove if empty.
sortedPageKeysByNestedLevel(pageKeys).forEach(pagePath => {
checkAndRemoveEmptyDir(publicDir, pagePath)
})
})
}

module.exports = {
getChangedPageDataKeys,
collectRemovedPageData,
removePageFiles,
}
103 changes: 101 additions & 2 deletions packages/gatsby/src/commands/build.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

const path = require(`path`)
const report = require(`gatsby-cli/lib/reporter`)
const fs = require(`fs-extra`)
import { buildHTML } from "./build-html"
const buildProductionBundle = require(`./build-javascript`)
const bootstrap = require(`../bootstrap`)
Expand All @@ -11,14 +12,25 @@ const { initTracer, stopTracer } = require(`../utils/tracer`)
const db = require(`../db`)
const signalExit = require(`signal-exit`)
const telemetry = require(`gatsby-telemetry`)
const { store, emitter } = require(`../redux`)
const { store, emitter, readState } = require(`../redux`)
const queryUtil = require(`../query`)
const appDataUtil = require(`../utils/app-data`)
const WorkerPool = require(`../utils/worker/pool`)
const { structureWebpackErrors } = require(`../utils/webpack-error-utils`)
const {
waitUntilAllJobsComplete: waitUntilAllJobsV2Complete,
} = require(`../utils/jobs-manager`)
const buildUtils = require(`../commands/build-utils`)
const { boundActionCreators } = require(`../redux/actions`)

let cachedPageData
let cachedWebpackCompilationHash
if (process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES) {
const { pageData, webpackCompilationHash } = readState()
// extract only data that we need to reuse and let v8 garbage collect rest of state
cachedPageData = pageData
cachedWebpackCompilationHash = webpackCompilationHash
}

type BuildArgs = {
directory: string,
Expand Down Expand Up @@ -119,6 +131,19 @@ module.exports = async function build(program: BuildArgs) {

await processPageQueries()

if (process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES) {
const { pages } = store.getState()
if (cachedPageData) {
cachedPageData.forEach((_value, key) => {
if (!pages.has(key)) {
boundActionCreators.removePageData({
id: key,
})
}
})
}
}

if (telemetry.isTrackingEnabled()) {
// transform asset size to kB (from bytes) to fit 64 bit to numbers
const bundleSizes = stats
Expand All @@ -144,7 +169,20 @@ module.exports = async function build(program: BuildArgs) {
// we need to save it again to make sure our latest state has been saved
await db.saveState()
pieh marked this conversation as resolved.
Show resolved Hide resolved

const pagePaths = [...store.getState().pages.keys()]
let pagePaths = [...store.getState().pages.keys()]

// Rebuild subset of pages if user opt into GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES
// if there were no source files (for example components, static queries, etc) changes since last build, otherwise rebuild all pages
if (
process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES &&
cachedWebpackCompilationHash === store.getState().webpackCompilationHash
) {
pagePaths = buildUtils.getChangedPageDataKeys(
store.getState(),
cachedPageData
)
pieh marked this conversation as resolved.
Show resolved Hide resolved
}

activity = report.createProgress(
`Building static HTML for pages`,
pagePaths.length,
Expand Down Expand Up @@ -184,6 +222,19 @@ module.exports = async function build(program: BuildArgs) {
}
activity.done()

let deletedPageKeys = []
if (process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES) {
activity = report.activityTimer(`Delete previous page data`)
activity.start()
deletedPageKeys = buildUtils.collectRemovedPageData(
store.getState(),
cachedPageData
)
await buildUtils.removePageFiles({ publicDir }, deletedPageKeys)

activity.end()
}

activity = report.activityTimer(`onPostBuild`, { parentSpan: buildSpan })
activity.start()
await apiRunnerNode(`onPostBuild`, {
Expand All @@ -201,4 +252,52 @@ module.exports = async function build(program: BuildArgs) {
await stopTracer()
workerPool.end()
buildActivity.end()

if (
process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES &&
process.argv.includes(`--log-pages`)
) {
if (pagePaths.length) {
report.info(
`Built pages:\n${pagePaths
.map(path => `Updated page: ${path}`)
.join(`\n`)}`
)
}

if (deletedPageKeys.length) {
report.info(
`Deleted pages:\n${deletedPageKeys
.map(path => `Deleted page: ${path}`)
.join(`\n`)}`
)
}
}

if (
process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES &&
process.argv.includes(`--write-to-file`)
) {
const createdFilesPath = path.resolve(
`${program.directory}/.cache`,
`newPages.txt`
)
const deletedFilesPath = path.resolve(
`${program.directory}/.cache`,
`deletedPages.txt`
)

if (pagePaths.length) {
await fs.writeFile(createdFilesPath, `${pagePaths.join(`\n`)}\n`, `utf8`)
report.info(`.cache/newPages.txt created`)
}
if (deletedPageKeys.length) {
await fs.writeFile(
deletedFilesPath,
`${deletedPageKeys.join(`\n`)}\n`,
`utf8`
)
report.info(`.cache/deletedPages.txt created`)
}
}
}
10 changes: 9 additions & 1 deletion packages/gatsby/src/commands/develop.ts
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,15 @@ async function startServer(program: IProgram): Promise<IServer> {
}

module.exports = async (program: IProgram): Promise<void> => {
if (process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES) {
report.panic(
`The flag ${chalk.yellow(
`GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES`
)} is not available with ${chalk.cyan(
`gatsby develop`
)}, please retry using ${chalk.cyan(`gatsby build`)}`
)
}
initTracer(program.openTracingConfigFile)
report.pendingActivity({ id: `webpack-develop` })
telemetry.trackCli(`DEVELOP_START`)
Expand Down Expand Up @@ -407,7 +416,6 @@ module.exports = async (program: IProgram): Promise<void> => {
require(`../redux/actions`).boundActionCreators.setProgramStatus(
`BOOTSTRAP_QUERY_RUNNING_FINISHED`
)

await db.saveState()

await waitUntilAllJobsComplete()
Expand Down
12 changes: 10 additions & 2 deletions packages/gatsby/src/query/query-runner.js
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,6 @@ module.exports = async (graphqlRunner, queryJob: QueryJob) => {
.createHash(`sha1`)
.update(resultJSON)
.digest(`base64`)

if (resultHash !== resultHashes.get(queryJob.id)) {
resultHashes.set(queryJob.id, resultHash)

Expand All @@ -117,7 +116,6 @@ module.exports = async (graphqlRunner, queryJob: QueryJob) => {
`d`,
`${queryJob.hash}.json`
)

await fs.outputFile(resultPath, resultJSON)
}
}
Expand All @@ -128,5 +126,15 @@ module.exports = async (graphqlRunner, queryJob: QueryJob) => {
isPage: queryJob.isPage,
})

// Sets pageData to the store, here for easier access to the resultHash
if (
process.env.GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES &&
queryJob.isPage
) {
boundActionCreators.setPageData({
id: queryJob.id,
resultHash,
})
}
return result
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Object {
"complete": Map {},
"incomplete": Map {},
},
"pageData": Map {},
"pageDataStats": Map {},
"staticQueryComponents": Map {},
"status": Object {
Expand Down
Loading