Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: significantly improve the performance of the url module #933

Closed
wants to merge 34 commits into from

Conversation

petkaantonov
Copy link
Contributor

See issues: #643, nodejs/node-v0.x-archive#6788

Original message:

I have rewritten the url parser module of node core as it was/is a serious bottleneck in some of the techempower benchmarks

Running node's urlparser benchmark using iojs it's still 16x faster when not retrieving properties and 11x faster when retrieving all properties (which are lazy getters in my implementation). In absolute terms the current iojs urlparser throughputs 25k parses per second vs 400k lazy/270k eager parses per second. format and resolve are also affected in similar magnitudes.

Testing this PR on my VM ubuntu I get

After

$ bleed benchmark/misc/url.js
misc/url.js parse(): 1.6529e+5
misc/url.js format(): 1.9114e+5
misc/url.js resolve("../foo/bar?baz=boom"): 24302
misc/url.js resolve("foo/bar"): 35696
misc/url.js resolve("http://nodejs.org"): 68003
misc/url.js resolve("./foo/bar?baz"): 29983

$ bleed benchmark/common.js url
url/url-parse.js
url/url-parse.js type=one n=250000: 1244025.74054
url/url-parse.js type=two n=250000: 2071492.21773
url/url-parse.js type=three n=250000: 1113916.06067
url/url-parse.js type=four n=250000: 3416688.31936
url/url-parse.js type=five n=250000: 1620766.16897
url/url-parse.js type=six n=250000: 1659564.59318

url/url-resolve.js
url/url-resolve.js type=one n=100000: 184353.89591

Before

$ git checkout v1.x && make -j4 && bleed benchmark/misc/url.js
misc/url.js parse(): 16510
misc/url.js format(): 18309
misc/url.js resolve("../foo/bar?baz=boom"): 8929.9
misc/url.js resolve("foo/bar"): 9919.5
misc/url.js resolve("http://nodejs.org"): 8177.6
misc/url.js resolve("./foo/bar?baz"): 8984.4

$ bleed benchmark/common.js url
url/url-parse.js
url/url-parse.js type=one n=250000: 82666.55318
url/url-parse.js type=two n=250000: 84669.54973
url/url-parse.js type=three n=250000: 80308.64406
url/url-parse.js type=four n=250000: 262017.65403
url/url-parse.js type=five n=250000: 203834.95043
url/url-parse.js type=six n=250000: 68911.18834

url/url-resolve.js
url/url-resolve.js type=one n=100000: 42331.42854

That's 10-24x faster depending on url.

Caveats

This implementation uses getters/setters on the prototype instead of own properties like current url. This is because often the properties are not needed and take time to calculate. Additionally @chrisdickinson introduced another reason to use setters at #893 (comment).

This patch is technically backward incompatible but to minimize any compatibility issues I made util._extend to call toJSON, because some core modules pass the Url instance directly to util._extend. The toJSON also means that serialized output of current url and the new url stay the same.

/cc @trevnorris @mikeal

@vkurchatkin vkurchatkin added the semver-major PRs that contain breaking changes and should be released in the next major version. label Feb 24, 2015
@benjamingr
Copy link
Member

I've been using Petka's url replacement in my apps for a while now and generally it works well. This can potentially speed up a lot of apps for free. The caveats don't sound too bad either. I admit that the fact this was not merged into node (or at least reviewed) when it was first written was quite a surprise for me.

@domenic
Copy link
Contributor

domenic commented Feb 24, 2015

Is this new implementation more or less compliant with the URL Standard? How does it stack up on the URL test suite?

@petkaantonov
Copy link
Contributor Author

@domenic The goal was not to make a new url parser, just to optimize the existing one. So it should give exactly the same result as the current url url parser - if that means some will fail there then so be it.

@mikeal
Copy link
Contributor

mikeal commented Feb 24, 2015

perhaps we should take this in the next major semver bump?

@petkaantonov
Copy link
Contributor Author

@mikeal that's ok with me, what branch do I pr against for major?

@mikeal
Copy link
Contributor

mikeal commented Feb 24, 2015

@petkaantonov we're still figuring that out :) we have the same problem with a PR for improving util.isObject and didn't entirely resolve it in the last TC meeting. I'll add this one to the agenda though.

@domenic
Copy link
Contributor

domenic commented Feb 24, 2015

I know we don't have any data (and @mikeal really wants a way to make some) but I'd be surprised if this was actually back-compat-breaking in any way.

@mikeal
Copy link
Contributor

mikeal commented Feb 24, 2015

@domenic this thread is pretty long but I seem to remember that there is an issue with the return value not working util.extends? That will surely cause breaks in a few people's applications.

@domenic
Copy link
Contributor

domenic commented Feb 24, 2015

@mikeal ah yeah you're right same with xtend. I take it back.

@petkaantonov
Copy link
Contributor Author

@mikeal util._extends is changed to call toJSON() - many internals pass Url object directly to util extend as well. toJSON is not an arbitrary method name, it is used by JSON.stringify as well.

https://github.com/iojs/io.js/pull/933/files#diff-3deb3f32958bb937ae05c6f3e4abbdf5R635

@mikeal
Copy link
Contributor

mikeal commented Feb 24, 2015

@petkaantonov are you saying it is changed or that it would need to change?

@petkaantonov
Copy link
Contributor Author

@mikeal It is changed in this PR as I linked

@mikeal
Copy link
Contributor

mikeal commented Feb 24, 2015

@petkaantonov sorry, in my email that link wasn't there :)

@petkaantonov
Copy link
Contributor Author

Yea I have a really bad habit of firing comments asap and then editing them later :P

@mikeal
Copy link
Contributor

mikeal commented Feb 24, 2015

Is there a perf impact elsewhere from changing ._extends?

@domenic
Copy link
Contributor

domenic commented Feb 24, 2015

I am not sure that changing _extend's semantics is a very good idea. It'd be better to change usages of it with URLs throughout the codebase to instead call toJSON() before passing to _extends.

@mikeal
Copy link
Contributor

mikeal commented Feb 24, 2015

@domenic what do you mean by "throughout the codebase?" the real issue isn't usage in core but usage by applications.

@domenic
Copy link
Contributor

domenic commented Feb 24, 2015

I'm talking purely about _extends here. I agree that in general people will use xtend or similar tools with URLs and this change will break that, which means overall its semver-major. I just am nitpicking @petkaantonov's particular mitigating strategy for handling core.

@petkaantonov
Copy link
Contributor Author

Is there a perf impact elsewhere from changing ._extends?

I cannot find a benchmark but I would be surprised if a property lookup and type-check would even show up considering the code on the very next line loads all properties of the object in an array.

@petkaantonov
Copy link
Contributor Author

@domenic I did it this way because util._extends is essentially used as public method I think.

@domenic
Copy link
Contributor

domenic commented Feb 25, 2015

@petkaantonov yes and I think modifying its behavior in the face of toJSON-able objects is not a good change to make.

@petkaantonov
Copy link
Contributor Author

Well it could always be a hardcoded check for Url instance as well, but I was thinking that if you really defined a toJSON method on an object, you would prefer to use the properties returned by that when using ._extend.

@petkaantonov petkaantonov force-pushed the faster-url-parser branch 3 times, most recently from f7de4dc to 4987dd1 Compare February 26, 2015 20:08
evanlucas and others added 2 commits April 28, 2015 01:01
The assertion made an assumption that the IPv6 address would always be
`::1`. Since the address can be different on different platforms, it
has been changed to allow multiple addresses.

Fixes: nodejs#1527
PR-URL: nodejs#1531
Reviewed-By: Rod Vagg <rod@vagg.org>
parallel tests still not working on most build slaves

PR-URL: nodejs#1544
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
@rvagg
Copy link
Member

rvagg commented Apr 28, 2015

@petkaantonov what's the status of this? Are you OK with @domenic's suggestion of putting getters & setters on all properties for consistency? If so, can we go ahead and get this merged or is there something else holding it up?

@petkaantonov
Copy link
Contributor Author

@rvagg Yes I am OK with it. I just haven't gotten around to do the work.

@rvagg
Copy link
Member

rvagg commented Apr 28, 2015

@petkaantonov cool, no problems, just let us know if timing sucks too much and we can cut it from 2.0.0 to take the pressure off.

@petkaantonov
Copy link
Contributor Author

Ill work on this today and see if I can finish this

enaqx and others added 4 commits April 28, 2015 10:12
PR-URL: nodejs#1535
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Julian Duque <julianduquej@gmail.com>
Some modules are monkey-patching Buffer.isEncoding, so without this
they cannot do that.

Fixes: nodejs#1547
PR-URL: nodejs#1548
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
PR-URL: nodejs#1473
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
PR-URL: nodejs#1498
Reviewed-By: Chris Dickinson <christopher.s.dickinson@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
@petkaantonov
Copy link
Contributor Author

Making everything accessors has given some unexpected trouble with the resolveObject method, it's gonna take one more day or two

tellnes and others added 9 commits April 28, 2015 14:38
This commit makes `os.tmpdir()` behave consistently on all platforms. It
changes `os.tmpdir()` to always return a path without trailing slash.

Semver: major
Fixes: nodejs#715
PR-URL: nodejs#747
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
This commit applies some secondary changes in order to make `make test`
pass cleanly:

* disable broken postmortem debugging in common.gypi

* drop obsolete strict mode test in parallel/test-repl

* drop obsolete test parallel/test-v8-features

PR-URL: nodejs#1232
Reviewed-By: Fedor Indutny <fedor@indutny.com>
Cherry-pick https://codereview.chromium.org/1033733003 from upstream
and re-enable postmortem debugging.

PR-URL: nodejs#1232
Reviewed-By: Fedor Indutny <fedor@indutny.com>
This includes the out-of-tree patch (but fixed in upstream HEAD) from
commit 41c00a2 ("deps: enable v8 postmortem debugging again".)

PR-URL: nodejs#1399
Reviewed-By: Fedor Indutny <fedor@indutny.com>
This commit applies a secondary change in order to make `make test`
pass cleanly, specifically re-disabling post-mortem debugging in
common.gypi.

PR-URL: nodejs#1506
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Cherry-pick https://codereview.chromium.org/1033733003 from upstream
and re-enable postmortem debugging.

PR-URL: nodejs#1232
Reviewed-By: Fedor Indutny <fedor@indutny.com>
PR-URL: nodejs#1553
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Based on tests running on original Raspberry Pi

PR-URL: nodejs#1554
Reviewed-By: Roman Reiss <me@silverwind.io>
@petkaantonov
Copy link
Contributor Author

Moved to #1561 which is rebased against master so that it's possible to review it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.