Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Add fast path for simple URL parsing #7878

Closed
wants to merge 1 commit into from

Conversation

gwicke
Copy link

@gwicke gwicke commented Jul 1, 2014

This patch adds a fast path for parsing of simple path-only URLs, as commonly
found in HTTP requests received by a server.

Benchmark results [ms], before / after patch:

/foo/bar              0.008956   0.000418 (fast path used)
http://example.com/   0.011426   0.011437 (normal slow path, no change)

In a simple ab benchmark of a single-threaded web server, this patch
increases the request rate from around 6400 to 7400 req/s.

@indutny
Copy link
Member

indutny commented Jul 1, 2014

Thank you! Looks really promising.

Could you please try running our benchmarks with and without your change?

NODE_BENCH_RUNS=10 NODE_BENCH=http node benchmarks/compare.js ./new-node-binary ./old-node-binary

Please submit the results here.

@gwicke
Copy link
Author

gwicke commented Jul 1, 2014

Can compare.js be used to compare changes in the libs only? The node binaries should not actually change in this case.

@indutny
Copy link
Member

indutny commented Jul 1, 2014

@gwicke unfortunately, no. But building binaries is easy, and you'll need to do it anyway in order to run tests.

@indutny
Copy link
Member

indutny commented Jul 1, 2014

@gwicke also the binaries do change, because js files are embedded into them

@gwicke
Copy link
Author

gwicke commented Jul 1, 2014

@indutny ah, makes sense. Running now after some futzing with the parameters.

gwicke added a commit to wikimedia/restbase that referenced this pull request Jul 2, 2014
Also submitted similar code to node core in
nodejs/node-v0.x-archive#7878
@gwicke
Copy link
Author

gwicke commented Jul 2, 2014

The results look fairly uninteresting. Slightly over 3% is the biggest change I see in any test. I suspect that url parsing is not actually tested, as it's not part of the http module?

@indutny
Copy link
Member

indutny commented Jul 2, 2014

Oh, sorry! That's right...

@indutny
Copy link
Member

indutny commented Jul 2, 2014

The question is - how it affects other cases in url.parse? Could you please provide a benchmark for it?

@gwicke
Copy link
Author

gwicke commented Jul 2, 2014

@indutny : The effect on other uses of url.parse is below the noise level in my tests. The second line in my micro-benchmark (see commit message) measures the case where the fast path is not taken.

This is not that surprising, as only one conditional and a simple regexp test are added. The regexp fails quickly if the first char is not a slash.

@gwicke
Copy link
Author

gwicke commented Jul 2, 2014

Here's another run using benchmark.js for the slow path ('http://example.com/'), best of 500 runs, 30 second test time:

master           11.8741ns
with this patch  11.8828ns

@trevnorris trevnorris added the url label Jul 2, 2014
@gwicke
Copy link
Author

gwicke commented Jul 2, 2014

This is the benchmark code:

var url = require('./url.js');
var Benchmark = require('benchmark');

var bench = new Benchmark(function() {
        return url.parse('http://example.com/bar');
    },
    {
        minSamples: 2000
    });
var samples = bench.run().stats.sample.sort();
var perc25 = samples[Math.ceil(samples.length / 4)]

console.log('slow path', perc25);

@indutny
Copy link
Member

indutny commented Jul 2, 2014

LGTM, @trevnorris your +1?

@trevnorris
Copy link

I would make the following changes:

diff --git a/lib/url.js b/lib/url.js
index fcf40c8..f0be09c 100644
--- a/lib/url.js
+++ b/lib/url.js
@@ -52,7 +52,7 @@ var protocolPattern = /^([a-z0-9.+-]+:)/i,
     portPattern = /:[0-9]*$/,

     // Special case for a simple path URL
-    simplePathPattern = /^(\/(?!\/)[^\?#\s]*)(\?[^#\s]*)?$/,
+    simplePathPattern = /^(\/\/?(?!\/)[^\?\s]*)(\?[^\s]*)?$/,

     // RFC 2396: characters reserved for delimiting URLs.
     // We actually just auto-escape these.
@@ -122,7 +122,7 @@ Url.prototype.parse = function(url, parseQueryString, slashesDenoteHost) {
   // This is to support parse stuff like "  http://foo.com  \n"
   rest = rest.trim();

-  if (!slashesDenoteHost) {
+  if (!slashesDenoteHost && hashSplit.length === 1) {
     // Try fast path regexp
     var simplePath = simplePathPattern.exec(rest);
     if (simplePath) {

This allows for the simple case that starts with // and also removes the need to check for # in the url.

Other than that it looks good to me.

This patch adds a fast path for parsing of simple path-only URLs, as commonly
found in HTTP requests received by a server.

Benchmark results [ms], before / after patch:
/foo/bar              0.008956   0.000418 (fast path used)
http://example.com/   0.011426   0.011437 (normal slow path, no change)

In a simple 'ab' benchmark of a single-threaded web server, this patch
increases the request rate from around 6400 to 7400 req/s.
@gwicke
Copy link
Author

gwicke commented Jul 3, 2014

@trevnorris : Pushed new patch with your suggested changes.

I was also considering to add support for simple lower-cased https? protocol urls, but that's slightly more complex and can wait for another patch.

@trevnorris trevnorris self-assigned this Jul 4, 2014
@trevnorris
Copy link

Cool. LGTM. I'll merge when I'm near a computer.

@gwicke
Copy link
Author

gwicke commented Jul 9, 2014

Ping!

@dougwilson
Copy link
Member

FWIW, express and friends will soon be using your patch, @gwicke :)

@indutny
Copy link
Member

indutny commented Jul 31, 2014

Landed in 4b59db0, thank you! Sorry for a delay!

@indutny indutny closed this Jul 31, 2014
@trevnorris
Copy link

Ah crap, yeah sorry about that. Thanks @indutny for the follow up.

@gwicke
Copy link
Author

gwicke commented Aug 1, 2014

Thanks @indutny & no worries @trevnorris !

if (simplePath[2]) {
this.search = simplePath[2];
if (parseQueryString) {
this.query = querystring.parse(this.search);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gwicke, I just found an error that I think it's caused by this line.
It seems that in te previous version, the parse function was called after removing the "?" from the this.search part, but now it's included, so the first querystring parameter has a "?" prepended.
Just changing this to this.query = querystring.parse(this.search.substr(1)); should fix this, i think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a before/after example? Just curious, and also it should be added as a test. :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @trevnorris,

node git:(master) node -v
v0.10.26node git:(master) node
> require('url').parse('/profile?access_token=1', true);
{ protocol: null,
  slashes: null,
  auth: null,
  host: null,
  port: null,
  hostname: null,
  hash: null,
  search: '?access_token=1',
  query: { access_token: '1' },
  pathname: '/profile',
  path: '/profile?access_token=1',
  href: '/profile?access_token=1' }
node git:(master) ./node -v
v0.13.0-prenode git:(master) ./node
> require('url').parse('/profile?access_token=1', true);
{ protocol: null,
  slashes: null,
  auth: null,
  host: null,
  port: null,
  hostname: null,
  hash: null,
  search: '?access_token=1',
  query: { '?access_token': '1' },
  pathname: '/profile',
  path: '/profile?access_token=1',
  href: '/profile?access_token=1' }

Look the query property in the parsed uri.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made a PR for this, #8151

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warseph: Thanks for the fix!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants