Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Tooling: Optimize build by spawning worker pool #15230

Merged
merged 7 commits into from
May 30, 2019

Conversation

aduth
Copy link
Member

@aduth aduth commented Apr 26, 2019

Previously: #8093

This pull request seeks to try to optimize npm run build. The previous effort to convert synchronous scripts to asynchronous equivalents from #8093 was adapted here. The changes here also include a refactoring to front-load files aggregation (i.e. discover all files to build at once, rather than by individual packages).

Ultimately, these changes had minimal impact; the larger impact is by adopting the worker-farm package, which manages creation of subprocesses and concurrency within those subprocesses.

The net result is a build time decrease of 56% (~24s to 10.5s, measured by time node ./bin/packages/build.js). The benefit will vary by machine, since worker-farm defaults to concurrent workers equal to the number of CPUs.

Testing Instructions:

There should be no regressions in the build of packages from npm run build.

There should be no regressions in the rebuild of packages from npm run dev (though there may be some separate bugginess with rebuilds as addressed in #15219).

@aduth aduth added [Type] Build Tooling Issues or PRs related to build tooling [Type] Performance Related to performance efforts labels Apr 26, 2019
@gziolo gziolo requested a review from oandregal April 29, 2019 10:41
@aduth aduth force-pushed the update/async-package-build-lets-try-this-again branch 2 times, most recently from 40ec580 to 4338f9e Compare May 20, 2019 16:05
Copy link
Contributor

@talldan talldan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Some nice refactoring of the build scripts.

I think the new code needs some kind of error handling. The previous implementation seemed to handle this more gracefully.

* @type {Object<string,Function>}
*/
const BUILD_TASK_BY_EXTENSION = {
async '.scss'( file ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮I'd never considered this as a way to define object properties. Very fancy!

const extension = path.extname( file );
const task = BUILD_TASK_BY_EXTENSION[ extension ];
if ( task ) {
await task( file );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented by putting some erroneous code into a file, and the result is an unhandled promise error (I suppose originating from babel.transformFileAsync). The taskbar ticked up to 100%, but the command didn't exit.

I wonder if some error handling could be added to the async function that this file exports, and the callback function provided with error details so that the parent process can handle exiting gracefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if some error handling could be added to the async function that this file exports, and the callback function provided with error details so that the parent process can handle exiting gracefully.

Interesting. Nice catch! We should definitely handle this gracefully. I'll take a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled better in c4f908b . It's not perfect in that I think the whole process should halt immediately when an error is encountered, but the nature of worker-farm makes this a bit difficult to implement. But at least it won't hang, nor will it show it as "Unhandled Promise exception".

Copy link
Contributor

@talldan talldan May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of a play with it, and see what you mean.

Ideally, the build.js script would return a non-zero exit code if an error occurs, as that would allow, for example, a CI job to end as early as possible.

One option might be to use process.exitCode = 1; in build.js if errors are encountered on a worker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker-farm also has a maxRetries config option, which is by default infinity. I wonder if that should be capped, potentially at 0 since I wouldn't expect a build error to suddenly cure itself (I imagine this is for things like HTTP requests).

Copy link
Member Author

@aduth aduth May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of a play with it, and see what you mean.

For posterity's sake, and since I was unclear in my original comment, the issue is that other workers may still be running when the error is handled, and there is no easy way through worker-farm to end the process only once the current round of worker handlers has completed. If the process is terminated early, subsequent worker finishes will loudly complain about the lack of a parent process to whom they expect to report.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker-farm also has a maxRetries config option, which is by default infinity. I wonder if that should be capped, potentially at 0 since I wouldn't expect a build error to suddenly cure itself (I imagine this is for things like HTTP requests).

It's a good idea, though in practice I don't expect this would occur. From its documentation, the retries are relevant for process termination. Failed builds wouldn't fall under this classification.

Normally I'd say that 0 would be the more sensible default, but since it might risk that we appear to succeed with the build despite a worker having terminated, I'd rather it get stuck trying to build infinitely than to produce a bad build. At least then it's obvious that something is wrong.

More preferable would be to cap the retries, but also detect and fail the whole build if one of the processes terminates unexpected. It's not clear to me how this might be done though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity's sake, and since I was unclear in my original comment, the issue is that other workers may still be running when the error is handled, and there is no easy way through worker-farm to end the process only once the current round of worker handlers has completed. If the process is terminated early, subsequent worker finishes will loudly complain about the lack of a parent process to whom they expect to report.

process.exitCode = 1 won't cause the process to be terminated early. It can be set in advance of the process exiting. I can push a commit demonstrating what I mean if that helps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process.exitCode = 1 won't cause the process to be terminated early. It can be set in advance of the process exiting. I can push a commit demonstrating what I mean if that helps?

Ah, my quoted text was meant as a clarification and not to dismiss, though I think I focused too much on this point to absorb the rest of your comment 😓 I agree with your assessment though, certainly the process should have a non-zero exit code if any build errors occur.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process.exitCode = 1 won't cause the process to be terminated early. It can be set in advance of the process exiting. I can push a commit demonstrating what I mean if that helps?

Pushed (with co-authorship) in cb51568. I confirmed that before this change, an error in the build would yield a status code 0, and 1 after.

Before:

node ./bin/packages/build.js && echo $?
# ...
0

After:

node ./bin/packages/build.js && echo $?
# ...
1

@aduth aduth force-pushed the update/async-package-build-lets-try-this-again branch from a9afb77 to c4f908b Compare May 25, 2019 00:14
@aduth
Copy link
Member Author

aduth commented May 25, 2019

I did a little more tweaking in 6663433 to leverage the fast-glob module's stream interface. I was seeing that glob.sync was taking ~250ms which, while not miserable, is reduced down to about 50ms for the first files to sent to the worker using streams instead.

bin/packages/build.js Outdated Show resolved Hide resolved

module.exports = async ( file, callback ) => {
const extension = path.extname( file );
const task = BUILD_TASK_BY_EXTENSION[ extension ];
Copy link
Member Author

@aduth aduth May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting for future: I've been planning in my head how I might implement something like #13124 . I don't plan to make the changes here, since it's already a fairly significant refactor.

I expect this task could be separated out into a function buildFile, which is optionally enhanced to read from / write to a cache. I'm thinking in some cases we may want to opt out of the caching behavior, e.g. in npm run package-plugin, since it seems there could be a (very-low-probability-but-possible) chance that hash collisions could yield a bad result†. That said, we should always be testing (manually and automated) the artifact, but as of today, we're only doing the former.

let buildFile = ( file ) => { /* ... */ };

if ( hasCache ) {
	buildFile = firstMatch( [ fromCache, buildFile ] );
}

buildFile = flow( [ buildFile, toCache ] );

† This assumes that an implementation would be similar to babel-loader in checking a cache folder for a file named by the MD5 (or other fast hash) of the source file's content.

cc @noisysocks

Copy link
Member

@noisysocks noisysocks May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps store the cache in a directory within the repository that is .gitignored? That way, when npm run package-plugin runs git clean -fxd it will empty the cache.

Not sure I'd worry about collisions. I'd use SHA1 as it's fast, has a low collision rate, and is good enough for git's object storage model which what inspired my approach in #13124.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps store the cache in a directory within the repository that is .gitignored? That way, when npm run package-plugin runs git clean -fxd it will empty the cache.

That's true, I'd not considered that. My plan was to do similar to what's done with babel-loader, i.e. a folder at node_modules/.cache. Seems it would work just the same as you propose (since node_modules is gitignored).

Not sure I'd worry about collisions. I'd use SHA1 as it's fast, has a low collision rate, and is good enough for git's object storage model which what inspired my approach in #13124.

I'm not overly worried, I'll just inevitably feel uncomfortable about the lack of being able to say that it's "impossible". I don't feel too strongly about the specific hash algorithm, just that it's fast. Some of what I've read seemed to imply MD5 beats out SHA1 on this front 🤷‍♂ [1][2][3]

Copy link
Member

@noisysocks noisysocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! 😍

I'm only seeing a modest (~20%) speedup when I compare to master, which I suspect is because my machine has less cores than yours. Still, a speed up is a speed up!

Copy link
Contributor

@talldan talldan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up from me too, thanks for addressing all the feedback 😃

@aduth
Copy link
Member Author

aduth commented May 30, 2019

Thanks you both! And thanks @talldan for identifying all of the very hard-to-spot oversights!

@aduth aduth force-pushed the update/async-package-build-lets-try-this-again branch from cb51568 to b9c1191 Compare May 30, 2019 14:24
@aduth
Copy link
Member Author

aduth commented May 30, 2019

I was hoping to see some improvement on Travis build times with these changes, but ultimately it appears it won't have much of an impact (for better or worse). As reference, Travis containers provide two cores (docs).

Control: https://travis-ci.com/WordPress/gutenberg/builds/113749307
Variable: https://travis-ci.com/WordPress/gutenberg/builds/113750147

@aduth aduth merged commit 5c38808 into master May 30, 2019
@aduth aduth deleted the update/async-package-build-lets-try-this-again branch May 30, 2019 14:50
@aduth
Copy link
Member Author

aduth commented May 30, 2019

There was a mistake here, unfortunately. Stylesheets should only be built from their entrypoints, not from the individual files. This worked fine for the full build, but for npm run dev watched file rebuilds, the wrong stylesheets were being rebuilt.

Fix at #15920

@noisysocks
Copy link
Member

I was hoping to see some improvement on Travis build times with these changes, but ultimately it appears it won't have much of an impact (for better or worse). As reference, Travis containers provide two cores (docs).

Caching, if I recall, is what made a huge difference in Travis! We'll need to do regular cleanup of the cache files though so that they don't grow into several GBs and cause Travis builds to slow down because of time spent downloading cache files from S3.

@aduth
Copy link
Member Author

aduth commented Jun 3, 2019

Caching, if I recall, is what made a huge difference in Travis! We'll need to do regular cleanup of the cache files though so that they don't grow into several GBs and cause Travis builds to slow down because of time spent downloading cache files from S3.

Yeah, I can definitely see this making the bigger difference.

Do you know anything about cleanup? I had a similar need for a side-project of mine where I ended up using a combination of touch and a crafted find command to eject old/un-touched files, but I don't know if something like this could be translated to Travis.

https://github.com/aduth/gutenberg.run/blob/e843868379121d9fc946d244c6e14a46775a53ab/images/gutenbergrun-cleaner/purge-trees.sh#L3

https://github.com/aduth/gutenberg.run/blob/e843868379121d9fc946d244c6e14a46775a53ab/web/tasks/run/03-create-container.js#L35-L38

@aduth
Copy link
Member Author

aduth commented Jun 3, 2019

I opened an issue at #15967 to ensure this task is tracked. We can continue implementation discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Build Tooling Issues or PRs related to build tooling [Type] Performance Related to performance efforts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants