Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server build: add webpack plugin that identifies and ships node_modules used by server #45774

Merged
merged 10 commits into from
Oct 26, 2020
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,9 @@ ENV NODE_ENV production
WORKDIR /calypso

RUN apk add --no-cache tini
COPY --from=builder --chown=nobody:nobody /calypso/ /calypso/
COPY --from=builder --chown=nobody:nobody /calypso/build /calypso/build
COPY --from=builder --chown=nobody:nobody /calypso/public /calypso/public
COPY --from=builder --chown=nobody:nobody /calypso/config /calypso/config

USER nobody
ENTRYPOINT ["/sbin/tini", "--"]
Expand Down
124 changes: 124 additions & 0 deletions bin/copy-production-modules.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
const fs = require( 'fs' );
const rcopy = require( 'recursive-copy' );
const mkdirp = require( 'mkdirp' );
const path = require( 'path' );
const yargs = require( 'yargs' );

const args = yargs
.usage( 'Usage: $0' )
.default( 'list', 'build/modules.json' )
.default( 'dest', 'build' )
.boolean( 'debug' ).argv;

function debug( message ) {
args.debug && console.log( message );
}

try {
debug( 'reading modules from ' + args.list );
const externalModules = JSON.parse( fs.readFileSync( args.list, 'utf8' ) );
debug( 'bundle directly requests ' + externalModules.length + ' packages' );
const shippingPkgList = processPackages( externalModules );
shipDependencies( shippingPkgList );
} catch ( err ) {
console.error( err );
process.exit( 1 );
}

function processPackages( pkgList ) {
const context = {
pkgList: [],
visitedFolders: new Set(),
folderStack: [ '.' ],
requiredBy: 'the bundle',
depth: 0,
};

for ( const pkgName of pkgList ) {
processPackage( pkgName, context );
}

return context.pkgList;
}

function processPackage( pkgName, context ) {
const { pkgList, folderStack, visitedFolders, requiredBy, depth } = context;

const pkgFolder = folderStack[ 0 ] + '/node_modules/' + pkgName;

// skip if the folder was already visited
if ( visitedFolders.has( pkgFolder ) ) {
return;
}

// mark the folder as visited
visitedFolders.add( pkgFolder );

// Verify that the package resolves to a directory in `node_modules/`, and that
// it is not a symlink to a monorepo packages. Such packages couldn't be shipped and
// must be bundled.
const pkgRealpath = path.relative( '.', fs.realpathSync( pkgFolder ) );
if ( ! pkgRealpath.startsWith( 'node_modules/' ) ) {
Copy link
Contributor

@scinos scinos Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if server imports something that is in client/node_modules, or in client/server/node_modules? Are those bundled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server bundle is in build/server.js, both before and after this patch, so it doesn't see modules in client/. Its resolution paths are build/node_modules, node_modules and parents.

Modules in client/ are visible only to webpack at build-time, when the resolution context are the source directories. Once a module is externalized, i.e., translated to a raw require( 'foo' ) call executed by Node.js at runtime, this context is lost.

This could be a problem, because there's no guarantee that all externalized modules are always hoisted to the top node_modules directory. But this particular PR doesn't make it better or worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is an existing (and pretty bad) problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one more little detail involved here: the webpack-node-externals plugin builds the list of externals by listing the root node_modules directory. If the sources request a foo package and the package is in client/node_modules/foo and not in node_modules/foo, the plugin won't identify foo as external and will let webpack bundle it.

throw new Error(
'package ' + pkgName + ' resolves outside node_modules/, likely a symlink: ' + pkgRealpath
);
}

// If the package is in the top-level folder, add it to the list of packages to ship.
// Subpackages are already shipped together with the parent.
if ( folderStack.length === 1 ) {
const indent = ' '.repeat( depth );
debug( indent + 'shipping ' + pkgName + ': required by ' + requiredBy );
pkgList.push( pkgName );
}

// read package's package.json
const pkgJson = JSON.parse( fs.readFileSync( pkgFolder + '/package.json' ) );

// collect dependencies from various fields
const depFields = [ 'dependencies', 'peerDependencies', 'optionalDependencies' ];
const pkgDeps = depFields.flatMap( ( type ) => Object.keys( pkgJson[ type ] || {} ) );

// bail out if package has no dependencies
if ( ! pkgDeps.length ) {
return;
}

// unshift the package's folder to the folder stack
const subFolderStack = [ pkgFolder, ...folderStack ];

// iterate its dependencies
for ( const depName of pkgDeps ) {
// Find the dependency in node_modules tree, starting with the package's `node_modules/`
// subdirectory and going up the directory tree to the root.
const foundFolderIdx = subFolderStack.findIndex( ( folder ) =>
fs.existsSync( folder + '/node_modules/' + depName )
);

if ( foundFolderIdx === -1 ) {
throw new Error( 'package not found: ' + depName + ', dependency of ' + pkgFolder );
}

// add the dependency to shipping list if eligible and recursively collect its dependencies
const subContext = {
...context,
folderStack: subFolderStack.slice( foundFolderIdx ),
requiredBy: pkgName,
depth: depth + 1,
};
processPackage( depName, subContext );
}
}

function shipDependencies( pkgList ) {
const destDir = path.join( args.dest, 'node_modules' );

debug( 'copying ' + pkgList.length + ' packages to ' + destDir );

mkdirp.sync( destDir );
for ( const pkgName of pkgList ) {
rcopy( path.join( 'node_modules', pkgName ), path.join( destDir, pkgName ), {
overwrite: true,
} );
}
}
20 changes: 11 additions & 9 deletions client/server/boot/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ import userAgent from 'express-useragent';
/**
* Internal dependencies
*/
import analytics from 'server/lib/analytics';
import config from 'server/config';
import api from 'server/api';
import pages from 'server/pages';
import pwa from 'server/pwa';
import loggerMiddleware from 'server/middleware/logger';
import analytics from 'calypso/server/lib/analytics';
import config from 'calypso/server/config';
import api from 'calypso/server/api';
import pages from 'calypso/server/pages';
import pwa from 'calypso/server/pwa';
import loggerMiddleware from 'calypso/server/middleware/logger';

/**
* Returns the server HTTP request handler "app".
Expand All @@ -33,7 +33,7 @@ export default function setup() {
app.use( loggerMiddleware() );

if ( 'development' === process.env.NODE_ENV ) {
require( 'server/bundler' )( app );
require( 'calypso/server/bundler' )( app );

if ( config.isEnabled( 'wpcom-user-bootstrap' ) ) {
if ( config( 'wordpress_logged_in_cookie' ) ) {
Expand Down Expand Up @@ -71,7 +71,9 @@ export default function setup() {
}
}

app.use( pwa() );
if ( ! config.isEnabled( 'desktop' ) ) {
app.use( pwa() );
}

// attach the static file server to serve the `public` dir
app.use( '/calypso', express.static( path.resolve( __dirname, '..', '..', '..', 'public' ) ) );
Expand All @@ -90,7 +92,7 @@ export default function setup() {
} );

if ( config.isEnabled( 'devdocs' ) ) {
app.use( require( 'server/devdocs' ).default() );
app.use( require( 'calypso/server/devdocs' ).default() );
}

if ( config.isEnabled( 'desktop' ) ) {
Expand Down
58 changes: 58 additions & 0 deletions client/server/bundler/external-modules.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/**
* External Dependencies
*/
const { builtinModules } = require( 'module' );

function getModule( request ) {
const parts = request.split( '/' );
if ( parts[ 0 ].startsWith( '@' ) ) {
return parts[ 0 ] + '/' + parts[ 1 ];
}
return parts[ 0 ];
}

module.exports = class ExternalModulesWriter {
constructor( options ) {
this.options = {
path: './build',
filename: 'modules.json',
...options,
};
}

apply( compiler ) {
compiler.hooks.emit.tapAsync( 'ExternalModulesWriter', ( compilation, callback ) => {
const externalModules = new Set();

for ( const module of compilation.modules ) {
if ( ! module.external ) {
continue;
}

const requestModule = getModule( module.userRequest );

// native Node.js module, not in node_modules
if ( builtinModules.includes( requestModule ) ) {
continue;
}

// loading local file by relative path, not in node_modules
if ( requestModule.startsWith( './' ) || requestModule.startsWith( '../' ) ) {
continue;
}

externalModules.add( requestModule );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of generating a file with the module names that we resolve manually using our own implementation later, can we change this plugin to get the list of resolved modules by webpack? That way we guarantee they resolve to the same file.

If that's not possible (webpack is ignoring anything with node_modules in the path, so chances are they don't try to resolve them), we could at least use webpack's enhanced-resolve to resolve them here and don't pass that burden to the consumer of the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webpack never really resolves these external modules. When it sees a request that matches the externals criteria, e.g., import React from 'react', it doesn't interpret the react specifier at all, and copies it verbatim to the output: require( 'react' ). It's then up to Node.js to resolve that at runtime.

we could at least use webpack's enhanced-resolve to resolve them here and don't pass that burden to the consumer of the list.

The knowledge about how would webpack resolve the request doesn't look very relevant: webpack never does that resolution, and the bundle contains simple require( 'react' ) calls. It's up to the runtime to resolve them.

It therefore seems right that the list contains a simple react item, passing the burden of interpreting that to the consumer. The consumer needs to verify and ensure that the runtime will be able to resolve react into a real package. Webpack is no longer involved at all at that moment.

Also, if I tried to resolve react in the webpack plugin, what context directory should I use? The react package has been requested by many files inside the source tree, and webpack resolved them all to a single external. Should I resolve react against client/my-sites/themes or against client/server/render? Hard to tell, it's requested from both places.

}

const json = JSON.stringify( Array.from( externalModules ), null, 2 );
const { mkdirp, writeFile, join } = compiler.outputFileSystem;
const { path, filename } = this.options;
mkdirp( path, ( err ) => {
if ( err ) {
return callback( err );
}
writeFile( join( path, filename ), json, 'utf8', callback );
} );
} );
}
};
4 changes: 1 addition & 3 deletions client/server/pwa/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,7 @@ export default () => {
// service-worker needs to be served from root to avoid scope issues
app.use(
'/service-worker.js',
express.static(
path.resolve( __dirname, '..', '..', 'lib', 'service-worker', 'service-worker.js' )
)
express.static( path.resolve( __dirname, '..', '..', '..', 'public', 'service-worker.js' ) )
);

return app;
Expand Down
27 changes: 21 additions & 6 deletions client/webpack.config.node.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
/**
* External dependencies
*/
const fs = require( 'fs' );
const globby = require( 'globby' );
const path = require( 'path' );
const webpack = require( 'webpack' );

Expand All @@ -22,6 +24,7 @@ const FileConfig = require( '@automattic/calypso-build/webpack/file-loader' );
const { shouldTranspileDependency } = require( '@automattic/calypso-build/webpack/util' );
const nodeExternals = require( 'webpack-node-externals' );
const { BundleAnalyzerPlugin } = require( 'webpack-bundle-analyzer' );
const ExternalModulesWriter = require( './server/bundler/external-modules' );

/**
* Internal variables
Expand All @@ -40,6 +43,16 @@ const fileLoader = FileConfig.loader( {

const commitSha = process.env.hasOwnProperty( 'COMMIT_SHA' ) ? process.env.COMMIT_SHA : '(unknown)';

function getMonorepoPackages() {
// find package.json files in all 1st level subdirectories of packages/
return globby.sync( 'packages/*/package.json' ).map( ( pkgJsonPath ) => {
// read the package.json file
const pkgJson = JSON.parse( fs.readFileSync( pkgJsonPath ) );
// create a package name regexp that matches all requests that start with it
return new RegExp( `^${ pkgJson.name }(/|$)` );
scinos marked this conversation as resolved.
Show resolved Hide resolved
} );
}

/**
* This lists modules that must use commonJS `require()`s
* All modules listed here need to be ES5.
Expand All @@ -52,13 +65,14 @@ function getExternals() {
// with modules that are incompatible with webpack bundling.
nodeExternals( {
allowlist: [
// `@automattic/components` is forced to be webpack-ed because it has SCSS and other
// non-JS asset imports that couldn't be processed by Node.js at runtime.
'@automattic/components',
// Force all monorepo packages to be bundled. We can guarantee that they are safe
// to bundle, and can avoid shipping the entire contents of the `packages/` folder
// (there are symlinks into `packages/` from the `node_modules` folder)
...getMonorepoPackages(),

// The polyfills module is transpiled by Babel and only the `core-js` modules that are
// needed by current Node.js are included instead of the whole package.
'@automattic/calypso-polyfills',
// bundle the core-js polyfills. We pick only a very small subset of the library
// to polyfill a few things that are not supported by the latest LTS Node.js,
// and this avoids shipping the entire library which is fairly big.
/^core-js\//,

// Ensure that file-loader files imported from packages in node_modules are
Expand Down Expand Up @@ -171,6 +185,7 @@ const webpackConfig = {
'components/empty-component'
), // Depends on BOM
new webpack.IgnorePlugin( /^\.\/locale$/, /moment$/ ), // server doesn't use moment locales
! isDevelopment && new ExternalModulesWriter(),
].filter( Boolean ),
};

Expand Down
5 changes: 4 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,11 @@
"build-desktop:app": "cd desktop && yarn run build",
"test-desktop:e2e": "node desktop/e2e/run.js",
"build-docker": "node bin/build-docker.js",
"build-static": "npx ncp static public",
"build-static": "npx ncp static public && npx ncp client/lib/service-worker public --filter=\"service-worker(/service-worker\\.js)?$\"",
"prebuild-server": "mkdirp build",
"build-server": "BROWSERSLIST_ENV=server webpack --config client/webpack.config.node.js --display errors-only",
"build-server:copy-modules": "node bin/copy-production-modules.js",
"postbuild-server": "node -e \"process.env.CALYPSO_ENV === 'production' && process.exit(1)\" || yarn run build-server:copy-modules",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this in other envs as well like stage or wpcalypso?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CALYPSO_ENV variable is a bit weird as it has two meanings.

During build, the values used in practice are development, production and desktop. There's also desktop-development, but I think that setup is not functional at this moment.

We never use stage or wpcalypso at build time. These values are used only in the second scenario: setting an environment variable at runtime. At runtime, we always run the same build produced by the production setup, and it's instructed to load the stage or wpcalypso configs by the CALYPSO_ENV environment variable.

"build-server-stats": "NODE_ENV=production CALYPSO_ENV=production EMIT_STATS=true yarn run build-server",
"build-client": "yarn run build-client-evergreen",
"build-client-do": "node ${NODE_ARGS:---max-old-space-size=8192} ./node_modules/webpack/bin/webpack.js --config client/webpack.config.js --display errors-only",
Expand Down Expand Up @@ -361,6 +363,7 @@
"react-dom": "^16.12.0",
"react-test-renderer": "^16.12.0",
"readline-sync": "^1.4.10",
"recursive-copy": "^2.0.10",
"replace": "^1.1.5",
"request": "^2.88.2",
"rimraf": "^3.0.0",
Expand Down