-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server build: add webpack plugin that identifies and ships node_modules used by server #45774
Conversation
This PR does not affect the size of JS and CSS bundles shipped to the user's browser. Generated by performance advisor bot at iscalypsofastyet.com. |
contexts = [ modFolder + '/node_modules', ...contexts ]; | ||
|
||
// iterate its dependencies | ||
for ( const dep of Object.keys( pkg.dependencies ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider peerDependencies
and optionalDependencies
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we don't. peerDependencies
don't trigger installation -- the depedency is supposed to be installed by someone else. Although I read some news that the very latest NPM will start installing them, too, in certain cases.
optionalDependencies
are very rare, and optional. But it's possible that a missing optional dependency will disable some feature that was until now enabled, due to a conditional require
succeeding. Worth auditing. It's not a problem for the plugin to start checking for them, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that if a module declares peerDependencies
or optionalDependencies
, and they are installed, we should copy them as well because chances are the module are importing them.
const modFolder = contexts[ 0 ] + '/' + mod; | ||
|
||
// read module's package.json | ||
const pkg = JSON.parse( fs.readFileSync( modFolder + '/package.json' ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we replace this logic with something more native, using module.createRequire(filename)
and require.resolve(request[, options])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use require.resolve
before writing my own naive resolver, but is didn't have a crucial feature I need: specifying the "context" directory from which to do the resolution. I.e., when node_module/foo
depends on bar
, find the node_modules/foo/node_modules/bar
instance rather than node_modules/bar
.
TIL about createRequire
-- even after reading the docs, I still don't get what it does 🙂
The plugin should probably use the webpack internal resolver rather than anything else. Something to be done as we improve this from a proof of concept into something shippable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, it does exactly what you need: create a require
function that starts the resolution in a provided directory, not in the current path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, it seems then that module.createRequire
and require.resolve
is exactly the combo I needed here 👍
|
||
module.exports = class NodeModuler { | ||
apply( compiler ) { | ||
compiler.hooks.afterEmit.tapAsync( 'NodeModuler', ( compilation ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this run after applying webpack aliases and module replacements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, module aliases and replacements are applied very early, when building the module dependency graph (starting with the entrypoint and descending recursively into dependencies).
On the other hand, afterEmit
happens very late, after all the JS assets were compiled, optimized and written to the disk.
e91fea4
to
81bfd43
Compare
@scinos I started tracking The server bundle directly requires 78 packages, and when we tracked only Optional dependencies: They are very rare, and ones that were not shipped already (due to being a dependency of another package) come exclusively from Peer dependencies: Most of them are While working on this, I found that |
There's one issue that must be resolved to make this plugin work: right now, we also externalize monorepo packages like We should either bundle these packages (we can ensure that our own packages are bundle-able), or follow the symlinks and copy the The latter is a bit suboptimal, because not all files in the package directory needs to be shipped. There is the |
c1005c3
to
f2157cc
Compare
818aac5
to
a892b3e
Compare
I believe this PR is now ready for review and deploy. It adds a webpack plugin that finds all external modules in the (finished) compilation and writes their names into a list in Then there is a Notes:
|
bin/copy-production-modules.js
Outdated
@@ -0,0 +1,124 @@ | |||
const fs = require( 'fs' ); // eslint-disable-line import/no-nodejs-modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit // eslint-disable-line import/no-nodejs-modules
should be moved to ./bin/.eslintrc.js
. Importing node modules should be ok to any file in ./bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 👍
// it is not a symlink to a monorepo packages. Such packages couldn't be shipped and | ||
// must be bundled. | ||
const pkgRealpath = path.relative( '.', fs.realpathSync( pkgFolder ) ); | ||
if ( ! pkgRealpath.startsWith( 'node_modules/' ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if server imports something that is in client/node_modules
, or in client/server/node_modules
? Are those bundled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server bundle is in build/server.js
, both before and after this patch, so it doesn't see modules in client/
. Its resolution paths are build/node_modules
, node_modules
and parents.
Modules in client/
are visible only to webpack at build-time, when the resolution context are the source directories. Once a module is externalized, i.e., translated to a raw require( 'foo' )
call executed by Node.js at runtime, this context is lost.
This could be a problem, because there's no guarantee that all externalized modules are always hoisted to the top node_modules
directory. But this particular PR doesn't make it better or worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this is an existing (and pretty bad) problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one more little detail involved here: the webpack-node-externals
plugin builds the list of externals by listing the root node_modules
directory. If the sources request a foo
package and the package is in client/node_modules/foo
and not in node_modules/foo
, the plugin won't identify foo
as external and will let webpack bundle it.
@@ -76,6 +76,8 @@ | |||
"build-static": "npx ncp static public", | |||
"prebuild-server": "mkdirp build", | |||
"build-server": "BROWSERSLIST_ENV=server webpack --config client/webpack.config.node.js --display errors-only", | |||
"build-server:copy-modules": "node bin/copy-production-modules.js", | |||
"postbuild-server": "node -e \"process.env.CALYPSO_ENV === 'production' && process.exit(1)\" || yarn run build-server:copy-modules", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this in other envs as well like stage
or wpcalypso
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CALYPSO_ENV
variable is a bit weird as it has two meanings.
During build, the values used in practice are development
, production
and desktop
. There's also desktop-development
, but I think that setup is not functional at this moment.
We never use stage
or wpcalypso
at build time. These values are used only in the second scenario: setting an environment variable at runtime. At runtime, we always run the same build produced by the production
setup, and it's instructed to load the stage
or wpcalypso
configs by the CALYPSO_ENV
environment variable.
client/webpack.config.node.js
Outdated
|
||
// The polyfills module is transpiled by Babel and only the `core-js` modules that are | ||
// needed by current Node.js are included instead of the whole package. | ||
'@automattic/calypso-analytics', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generate this list automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be great. Instead of a static list, there could be a function isMonorepoPackage
that determines the monorepo-ness of a package. But I don't know how to write such a test right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could find all package.json
in packages/*/package.json
, read the name and compose this list dynamically.
Otherwise we risk anybody adding a new package used in client or server and breaking only in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea, implemented using globby
and it seems to work beautifully 👍
I was playing with this PR and realized there is potentially an easier solution:
Isn't that roughly similar to this process?
With the current It is not strictly the same (there may be a package declared in |
Would that be a modification of the existing The current I'm not sure how the non-hoisted install in Another thing about monorepo packages that requires attention is: if packages used by |
I think it should be a special install only to create production
If we want to optimize it further, yes. After a first pass of removing some dependencies and moving others to
I think the symlink is fine, as long as we also copy There are a few other things I'd like to try:
True, we would need to do some manual work to ensure dependencies are not duplicated when they are the same version. -- As an alternative to yarn unhoisting things, we could just delete all dependencies in the root Which brings in another aspect of this problem: if unhoisted |
a892b3e
to
a404b13
Compare
@scinos Could you please push a commit that does that? I'm not very familiar with how we build Docker images these days. Do we already use the multi-stage process where the build is done in one container and then the results are copied to the production container? Or do we still do everything in one container? We could even go a bit further -- after this PR, Calypso needs only the
I.e., when the
Can you compare the list of packages that your experiments produce against what the |
After running this branch and generating the reduced There are a few "missing" modules, but after inspecting many of them, looks like they are native modules, or used in comments, tests, bin scripts or similar, so they are actually not required in production. Script: List:
A notable missing package is But given the above list, I'm more convinced that this approach is probably copying over everything required in prod. |
I've merged a change to report all unhandled exceptions and rejections to logstash in #46638, it should give us visibility if this fails |
Yes, these are all |
Done. Now we only copy |
To be honest, this still looks to me like a patch over the actual problem (that our dependencies are a mess) to get a benefit not entirely clear (given how docker share layers between images, is not as simple as "this image is smaller"). But I don't want to delay this further or block it, so given that we have logs that will tell us if this breaks in prod and all tests are passing, it is good to go for me. Something we could consider is that we can just run the plugin once, get the list of external modules, and use that list to correctly populate |
Thanks @scinos for your help making this happen 🤝 I'll merge it on Monday, as it's not a good candidate for a Friday deploy.
This list doesn't contain the entire list of dependencies for In a sense, with SSR, the client and server are not two different packages, but rather two bundles built from the same sources. These sources have common dependency set defined in That's why I think that un-messing the dependencies will be a difficult mission, and that a dedicated mechanism to prune the server runtime dependencies will continue to be impactful for a long time. But I'll be happy to help make progress on that nevertheless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added one more commit to make the requests to |
wp-desktop ci passing, closing review
I had to disable the The Desktop app doesn't need the PWA functionality anyway: push notifications are handled differently, and it also doesn't need the FYI @nsakaimbo |
Thanks for the heads up, @jsnajdr! |
This is a proof-of-concept of an alternative to full
node_modules
bundling in #45570.Adds a webpack plugin that at the end of compilation looks up all external modules and builds a subset of
node_modules
that are used by the server bundle.The subset is composed both from modules that are
require
d directly, and also by recursively checking these direct modules' dependencies.Then it creates a new
build/node_modules
directory with that subset copied out of rootnode_modules
. It's indented to be shipped to the runtime environment, e.g., as part of Docker container. Leaving the rootnode_modules
behind in the build environment.This is a very rough draft and a proof of concept, soliciting feedback.
The resulting
build/node_modules
is now 59MB, quite an improvement from the 1.5GB without this optimization.