-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: speed up package.json parsing #15767
Conversation
Thoughts on semver-ness? |
I'm inclined to say semver-patch. The change in behavior only affects applications that don't start up in the first place. |
@@ -112,6 +112,9 @@ function readPackage(requestPath) { | |||
return false; | |||
} | |||
|
|||
if (json === '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does V8 make this and json.length === 0
equivalent instruction-wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.length === 0
is probably a bit slower. The empty string is a singleton so s === ''
is basically a pointer comparison, whereas s.length
needs to validate s
first (although I expect it will be a fast path protected by a guard once optimized.)
On the example project with ~250 dependencies and only a handful without a main, how did the performance copmare? Just wondering how much the slow-down might be for an application where every package.json file does have a main and how negligible this is. |
Just to give a number, of the 596 package.jsons in Node’s |
It's the same, no statistically significant speed-up or slow-down. The string search is practically free compared to deserializing the JSON object. One potential improvement is to implement our own special-purpose JSON parser that scans for just the That's a lot more work though and I'm not even sure it's a good idea per se. |
That may well be interesting to work on in future for additional performance gain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Should this PR (except the first commit) be semver-major because of the change in module loading behavior (invalid/malformatted modules may no longer be detected)? |
@mscdex My opinion in case you missed it: #15767 (comment) |
It would be far more ideal to releasers if this was separate from commits that aren't that category.
If that exception bubbled up to user land I do think this (the sum of the 3 commits) is |
If the package.json does not contain the string '"main"', skip parsing it to JSON. Note that this changes the behavior of the module loader in the presence of package.json files that don't contain legal JSON. Such files used to throw an exception but now they are simply ignored unless they contain a "main" property. To me, that seems like a good trade-off: I observe a 25% reduction in start-up time on a medium-sized application[0]. [0] https://github.com/strongloop/sls-sample-app
Move the logic from the previous commit to C++ land in order to avoid creating a new string when we know we won't parse it anyway.
9dae2b3
to
b46622e
Compare
Rebase + new CI: https://ci.nodejs.org/job/node-test-pull-request/11421/ |
b46622e
to
0fdd88a
Compare
If the package.json does not contain the string '"main"', skip parsing it to JSON. Note that this changes the behavior of the module loader in the presence of package.json files that don't contain legal JSON. Such files used to throw an exception but now they are simply ignored unless they contain a "main" property. To me, that seems like a good trade-off: I observe a 25% reduction in start-up time on a medium-sized application[0]. [0] https://github.com/strongloop/sls-sample-app PR-URL: nodejs#15767 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Move the logic from the previous commit to C++ land in order to avoid creating a new string when we know we won't parse it anyway. PR-URL: nodejs#15767 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
1132ea7...0fdd88a and tagged semver-major because I'm conservative like that. |
|
I think this should be revisited. However, now with the introduction of this bit it essentially locks it into |
Does this fail if |
@charmander I guess it would. Can you point to a module that does that? |
Was the perf win, when applicable, primarily from the first commit (the one that didn't touch the c++ helper)? If so I think that could be enough, without having to go deeper into the internals #17084. |
@jdalton The second commit avoids ~300 kB of strings being created needlessly when you invoke npm. That's pretty substantial. |
What's that mean for execution time? (e.g. does it move from 25% to 30% improvement?) What I'm getting at is, that while it is nice, it's not a guaranteed perf win since it's super scenario specific ( |
It cuts down the number and total size of package.json strings by about 35%. To put that in perspective: even a 1% or 2% improvement would have been well worth it, every bit helps. If you don't think it's a worthwhile optimization, the onus is on you to prove it with numbers (but you'd be wasting your time because it is.) |
In the past, improvements like using
Wouldn't the responsibility be to prove the perf wins before introducing something that requires a semver major? I'm all for optimizations, but I'd like to have a more clear picture on what we're giving up usability and potential user code breakage (semver major) for. I don't think asking for the percentage wins of the second commit is unreasonable. So again, what's that mean for execution time? (e.g. does it move from 25% to 30% improvement?) |
The semver-major tag is just me being ultra-conservative (and also about giving people an incentive to upgrade when v10 comes out.) |
@bnoordhuis |
@bnoordhuis I can’t, but is it maybe worth also checking for |
@jdalton See #4679, my original PR did that but it caused regressions on Windows and I never quite figured out why. @charmander I suppose so, I'll think about it. I don't mind doing that if it's necessary but neither do I like adding complexity and overhead for what is most likely a non-issue. |
@bnoordhuis might be worth trying again (with today's code base). It looks like there was other Jenkins weirdness back then too. |
@charmander I think it's mostly a non-issue as npm serializes and reserializes package.json during installation, which replaces |
It’s a small amount of code complexity and overhead in exchange for replacing “works for almost all valid JSON representations” and “mostly a non-issue” with “works for all JSON representations” and “is a non-issue”. Is the certainty worth having? |
Not if it's dead code. Then it's just a place for bugs to hide. (I don't subscribe to Postel's maxim in case you wondered.) |
That doesn’t really apply here – JSON is JSON. I’ll open a PR. |
The first commit is a bug fix and should be back-ported to the release branches but I'm including it here because the other commits build on top of it.
From the second commit log:
That 25% is going to depend on how many package.json files have a
"main"
property. With another project, only a handful out of ~250 did not have one.CI: https://ci.nodejs.org/job/node-test-pull-request/10401/