-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server build: create a self-contained JS bundle with all node_modules included #45570
Conversation
This PR does not affect the size of JS and CSS bundles shipped to the user's browser. Generated by performance advisor bot at iscalypsofastyet.com. |
Should we remove |
Good point, I believe it should be in this PR, one of the main methods how to validate it. Does it mean adding a command into the Dockerfile, after building Calypso and before launching it?
|
We could also extract the node-modules-deleting code from the
That would also remove the nested |
I think the ideal way of removing Line 49 in b3ed938
Line 52 in b3ed938
|
To remove it for good so it doesn't contribute to the image size, we need to make sure it is never present in the final image, what @sirreal said. To remove Maybe we also need some bundle analysis to see if there are calls to |
I added the
That's a good point! For every external, webpack creates a stub module that looks like this: /* 1 */
/***/ (function(module, exports) {
module.exports = require("lodash");
/***/ }), By grepping the
Except the Looking at the output of |
I check out this branch (commit 2d07ca7065) and run
When running with
Probably most of those are in comments or load a file in |
Thinking about the safety of this change... how do we know we don't have a module that does a dynamic require, something like:
I don't think that will be bundled by webpack, and therefore the module will be missing in production. How do we protect ourselves against that pattern? |
Webpack will bundle this: it tries really hard to understand the |
Oops, that's a good catch. Development mode should continue to use the externals. The I'll have a closer look at the production |
Oh, I didn't know about webpack interpreting However, it is rather weak. Based on an experiment (https://codesandbox.io/s/wizardly-shape-8lofs?file=/src/index.js), seems like it only works if the expression inside
So there is still a chance that webpack won't be able to bundle everything needed at runtime. Edit: and that's only about |
d5f2c11
to
e89f8c8
Compare
This is something real I had to fix in e89f8c8. Previously, there was a I moved the The A
and the same
|
Another bundling issue I had to fix in 7398b9a, reusing a fix I already had to do in The if (global.GENTLY) require = GENTLY.hijack(require);
var crypto = require('crypto'); Unfortunately, webpack translates this to something that doesn't make much sense: var require;if (global.GENTLY) require = GENTLY.hijack(__webpack_require__(1064));
var crypto = require('crypto'); See how We need to use if (false) {}
var crypto = __webpack_require__(47); |
webpack will at least issue a warning if it can't figure out the
It's weird that it's just a warning and the compilation will proceed anyway. I don't see any way how to enforce an error here. webpack has some options to configure these contextual requires, but still no option to make them fail the build. |
That's a good point. In practice, I don't see how a package author would be motivated to load JS files in this way, but it can be plausibly used to load and parse package-internal files that // load from disk
const template = fs.readFileSync( __dirname + '/template.html' );
// parse and use
cheerio.load( template ); webpack won't bundle that either. In the end, I think we'll need to "know our modules", even the ones we use to build the server. On the client side, we are quite familiar with what kind of packages are in our bundle and what they do. On the server, I'm afraid we've been somewhat oblivious to that. |
This change looks quite brittle and risky to me. Maybe we should take a step back and better define why we want to bundle node_modules. What do we expect to get form smaller docker images? Do we expect them to build faster? Do we expect them to deploy faster? By how much? Answering those questions would allow us to better assess risk vs benefits and explore alternative solutions. |
e89f8c8
to
6c5a802
Compare
I believe we already mitigated many of the risks, so what are the remaining ones? The
I started this PR after a Slack discussion about #21537 where you said: "I think the biggest problem is the humongous
One alternative solution is to ship a It could be done by:
One thing that's good for us is that the runtime |
Some stats about the Calypso production artifacts:
Therefore, a production build artifact of Calypso, containing everything that's needed to run it, is approx 100MB of uncompressed stuff. Shipping |
6c5a802
to
7f3e823
Compare
I've been hoping to help push this forward, but things have been hectic. I probably won't be able to get to this until next week. |
Can we envision an automated test for that? We can't rely on someone doing that audit manually every time a dependency is updated. In general, modules assume they will be installed in a node_modules, and we don't know how far each author has taken that assumption. I think breaking that assumption is inherently risky, and personally I wouldn't go that path unless the benefits are huge.
Let's say we move the server code to a separate app (and therefore a separate node_modules). We would have a regular node.js app with its package.json and a production installation of It will have other benefits for development, like separating the backend server form the webpack server. There are shared code between client and server, but after a quick review seems like some of it can be moved to independent packages (eg: |
That automated test would parse the output bundle, and audit all calls to
It's true that putting packages for Node.js into I tried to do the less risky approach in #45774 today: intercept all externals in the webpack compilation, and use them to select a subset of
It seems to me that this approach is at odds with SSR. With SSR, server and client are not really separate codebases, but it's the same code compiled for and running in two different environments. Also, most of the
I'm open to exploring that route, but at this moment it's hard to imagine for me how exactly it would work. |
Maybe a quick solution is to wipe-out |
7f3e823
to
1355bca
Compare
There's clearly some risk here, but I'm curious about something. Do we take it as a given that there are essentially three types of packages we might be using?
Clearly, some packages only make sense in a certain environment (e.g. express). I ask this because it seems odd to be more concerned about the behavior of packages when bundled in the context of the server than in the context of the client. We assume that all client-focused or agnostic packages behave well when bundled or when used in Node.js. Are we less comfortable with that assumption for server-only packages? I'm also curious about what failure would look like in some of the above cases with |
One example: in Calypso, we use the
This category should also include most React components. A very nice-to-have feature of these is being SSR compatible. I.e., touch DOM very carefully, only in lifecycle methods and
That's because in the client, NPM packages are almost always bundled -- it's rare exception to load a module directly from browser. In the server world, though, bundling And there are packages that really fail when bundled. https://unpkg.com/browse/formidable@1.2.2/lib/incoming_form.js The |
The most dangerous case is where the failing |
If |
It's the only offender we know so far. Anyway, replacing
|
1355bca
to
9465b52
Compare
9465b52
to
27ad8a7
Compare
We merged #45774 instead, closing this one. |
Helps make the Calypso server+client a well-defined artifact that can be tarballed and deployed on just about any server with a good-enough version of Node.js.
Currently, one major obstacle is the
node_modules
folder that contains 1.5GB of modules that are mostly build tools and are not needed at runtime. But we don't know who is who.The first thing that this PR adds is a
yarn run analyze-bundles:server
command that createswebpack-analyze-bundles
overview of the bundle contents.Then I enforce bundling
node_modules
by removing the use of thewebpack-node-externals
plugin.Before:
This is how the bundle looks before this patch, with
node_modules
still external to the bundle:Chunk sizes (the
parsed
sizes, asgzipped
is not relevant for server):Chunk layout:
Even the existing server bundle seems fairly bloated. It shouldn't contain the moment locales at all, as server doesn't do anything localized. Another things that are surprising to see in the bundle are:
reader
state. There's no SSR in Reader so the state is very unlikely to be used. I think this comes from the Reader social preview -- we talked about this few months ago with @getdave when he was working onsocial-preview
. It would be nice to add a streamlined Reader preview component to that package, too.inline-help
code. Probably dragged in by theLayout
component.lib/plans
andlib/purchases
.One thing that would be nice to have is creating separate chunks for isomorphic sections (like
login
orthemes
) and loading them dynamically (on first request for that section, just like in client) rather than synchronously. Then it becomes better visible whether a particular module is used by the base server, or one of the isomorphic section (and which one).After:
Chunk sizes (still
parsed
):Chunk layout:
The most important information is that the
node_modules
are 60% of the bundle and the remaining 40% is the original app code.The main sources of bloat that I see here:
caniuse-lite
data. It's used bybrowserslist-useragent
used by theisUAInBrowserslist
function that determines whether the browser deserves theevergreen
or thefallback
assets.moment-locale
data. What is new after this patch is that something is requiringmoment-locale
synchronously and it's no longer in a separate chunk.lodash
twice. The server should use thelodash-es
replacement plugin.request
library (can't find where it's used) depends on several exotic crypto librares (sshpk
,tweetnacl
) and also onmime-db