-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized waterfall, parallel, et al. #1395
Conversation
Unless JavaScript engines have changed how they handle arguments I question these results and suggest you test in a couple different environments (different versions of node, different versions of chrome, firefox, safari, etc). @jdalton, @jridgewell and I investigated this pretty closely about 2 years ago. Reference ticket jashkenas/underscore#2140. |
Also we changed from |
@megawac v8, at least with From what I understand from petkaantonov's Optimization killers and what the design doc says about
then using But yes, this will (most likely) be slower in other browsers/older versions of Node. |
lib/auto.js
Outdated
@@ -4,7 +4,7 @@ import indexOf from 'lodash/_baseIndexOf'; | |||
import isArray from 'lodash/isArray'; | |||
import okeys from 'lodash/keys'; | |||
import noop from 'lodash/noop'; | |||
import rest from './internal/rest'; | |||
import slice from 'lodash/slice'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use ./internal/slice
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, looks like my auto-import plugin picked up the wrong slice
.
lib/seq.js
Outdated
@@ -1,5 +1,6 @@ | |||
import noop from 'lodash/noop'; | |||
import rest from './internal/rest'; | |||
import slice from 'lodash/slice'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before, sorry, is it possible to use ./internal/slice
here?
I'm seeing a similar performance gain with node 0.12:
(Benchmarks don't work in node 0.10). Currently working on a browser benchmark. I looked at the benchmarks associated with that underscore ticket. The |
In that case, there were also benchmarks created for lodash which was previously using |
lib/doDuring.js
Outdated
@@ -28,11 +28,12 @@ export default function doDuring(fn, test, callback) { | |||
var _fn = wrapAsync(fn); | |||
var _test = wrapAsync(test); | |||
|
|||
var next = rest(function(err, args) { | |||
if (err) return callback(err); | |||
function next(err/*, args...*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind changing comments to /*, ...args*/
(a bit better semantically)
I haven't looked into this in a while but it used to be debilitating in Firefox, okay in Chrome and IE, and pretty bad in Opera. Please make some browser tests before merging this |
@@ -106,6 +105,37 @@ describe("waterfall", function () { | |||
}).to.throw(/already called/); | |||
}); | |||
|
|||
it('multiple callback calls (trickier) @nodeonly', function(done){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suguru03 I think neo-async fails on this test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it, thanks. :)
We started using Node 7 Benchmarks
$ node perf/benchmark.js -g 'waterfall|parallel' v1.3.0 v1.4.0 Comparing v1.3.0 with v1.4.0 on Node v7.7.4 -------------------------------------- parallel(10) v1.3.0 x 95,854 ops/sec ±0.91% (27 runs sampled), 0.0104ms per run parallel(10) v1.4.0 x 84,751 ops/sec ±2.31% (27 runs sampled), 0.0118ms per run v1.3.0 is faster -------------------------------------- parallel(100) v1.3.0 x 24,968 ops/sec ±1.95% (27 runs sampled), 0.0401ms per run parallel(100) v1.4.0 x 20,407 ops/sec ±1.51% (28 runs sampled), 0.0490ms per run v1.3.0 is faster -------------------------------------- parallel(1000) v1.3.0 x 3,037 ops/sec ±1.52% (28 runs sampled), 0.329ms per run parallel(1000) v1.4.0 x 2,303 ops/sec ±1.63% (29 runs sampled), 0.434ms per run v1.3.0 is faster -------------------------------------- waterfall(10) v1.3.0 x 19,206 ops/sec ±3.37% (24 runs sampled), 0.0521ms per run waterfall(10) v1.4.0 x 18,883 ops/sec ±2.88% (27 runs sampled), 0.0530ms per run Tie -------------------------------------- waterfall(100) v1.3.0 x 2,589 ops/sec ±2.26% (29 runs sampled), 0.386ms per run waterfall(100) v1.4.0 x 2,427 ops/sec ±3.31% (28 runs sampled), 0.412ms per run v1.3.0 is faster -------------------------------------- waterfall(1000) v1.3.0 x 271 ops/sec ±3.85% (28 runs sampled), 3.69ms per run waterfall(1000) v1.4.0 x 243 ops/sec ±2.90% (27 runs sampled), 4.11ms per run v1.3.0 is faster -------------------------------------- v1.3.0 faster overall (4.51ms total vs. 5.07ms total) v1.3.0 won more benchmarks (5 vs. 0) Node 0.10 Benchmarks
$ node perf/benchmark.js -g 'waterfall|parallel' v1.3.0 v1.4.0 Comparing v1.3.0 with v1.4.0 on Node v0.10.48 -------------------------------------- parallel(10) v1.3.0 x 36,982 ops/sec ±3.28% (26 runs sampled), 0.0270ms per run parallel(10) v1.4.0 x 34,188 ops/sec ±2.63% (26 runs sampled), 0.0293ms per run v1.3.0 is faster -------------------------------------- parallel(100) v1.3.0 x 9,111 ops/sec ±2.07% (30 runs sampled), 0.110ms per run parallel(100) v1.4.0 x 7,572 ops/sec ±2.34% (27 runs sampled), 0.132ms per run v1.3.0 is faster -------------------------------------- parallel(1000) v1.3.0 x 1,051 ops/sec ±3.07% (27 runs sampled), 0.952ms per run parallel(1000) v1.4.0 x 782 ops/sec ±3.82% (28 runs sampled), 1.28ms per run v1.3.0 is faster -------------------------------------- waterfall(10) v1.3.0 x 20,187 ops/sec ±2.54% (27 runs sampled), 0.0495ms per run waterfall(10) v1.4.0 x 17,901 ops/sec ±2.79% (26 runs sampled), 0.0559ms per run v1.3.0 is faster -------------------------------------- waterfall(100) v1.3.0 x 2,892 ops/sec ±2.27% (25 runs sampled), 0.346ms per run waterfall(100) v1.4.0 x 2,662 ops/sec ±1.64% (27 runs sampled), 0.376ms per run v1.3.0 is faster -------------------------------------- waterfall(1000) v1.3.0 x 315 ops/sec ±2.22% (31 runs sampled), 3.17ms per run waterfall(1000) v1.4.0 x 285 ops/sec ±3.53% (25 runs sampled), 3.51ms per run v1.3.0 is faster -------------------------------------- v1.3.0 faster overall (4.66ms total vs. 5.38ms total) v1.3.0 won more benchmarks (6 vs. 0) It was a 10-12% performance regression. This PR uses a more streamlined version of slice, so the gains are a bit more. Still working on the browser benchmark, jsperf.com is being a PITA right now.... |
Here's a preliminary benchmark: https://jsperf.com/async-rest-vs-slice8/1 It's really uninteresting, because the 1ms+ |
Here's a more throughput-ey test: https://jsperf.com/async-rest-vs-slice9/1 The slice strategy seems to a small advantage here. |
lib/apply.js
Outdated
@@ -44,8 +45,10 @@ import rest from './internal/rest'; | |||
* two | |||
* three | |||
*/ | |||
export default rest(function(fn, args) { | |||
return rest(function(callArgs) { | |||
export default function(fn/*, args*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/, ...args/
lib/internal/consoleFunc.js
Outdated
wrapAsync(fn).apply(null, args.concat(rest(function (err, args) { | ||
return function (fn/*, ...args*/) { | ||
var args = slice(arguments, 1); | ||
wrapAsync(fn).apply(null, args.concat(function (err/*, ...args*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: args.push
here could avoid an extra array being created
|
||
export default function (fn) { | ||
return rest(function (args/*..., callback*/) { | ||
return function (/*...args, callback*/) { | ||
var args = slice(arguments); | ||
var callback = args.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pop can be avoided, slice(arguments, 0, -1)
, callback = arguments[arguments.length - 1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna leave as-is. The streamlined version of slice doesn't support the 3rd end
param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
lib/reflect.js
Outdated
value = cbArgs[0]; | ||
} else if (cbArgs.length > 1) { | ||
value = cbArgs; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer this condition, except removing the else if
and replacing with an else
lib/waterfall.js
Outdated
@@ -68,24 +68,23 @@ export default function(tasks, callback) { | |||
if (!isArray(tasks)) return callback(new Error('First argument to waterfall must be an array of functions')); | |||
if (!tasks.length) return callback(); | |||
var taskIndex = 0; | |||
var args = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating this variable I would consider modifying nextTask
to take args as a param (nextTasks(args)
). In next
, we would just call nextTask(slice(arguments, 1))
lib/whilst.js
Outdated
if (err) return callback(err); | ||
if (test()) return _iteratee(next); | ||
callback.apply(null, [null].concat(args)); | ||
}); | ||
callback.apply(null, [null].concat(slice(arguments, 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I long for splats 💨
Thanks for looking into the perf |
@@ -1,7 +1,7 @@ | |||
import isArray from 'lodash/isArray'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice cleanup of waterfall ⭐
This benchmark might shed some light into the performance numbers: https://jsperf.com/arguments-handling4/1 The inline slice strategy is the fastest (but ugliest), followed by a statically wrapped We were essentially doing the dynamically wrapped version of |
return rest(function(callArgs) { | ||
export default function(fn/*, ...args*/) { | ||
var args = slice(arguments, 1); | ||
return function(/*callArgs*/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ...callArgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
I ran some profiling, and notices that a lot of time is being spent in
internal/rest
. It turns outrest()
is much slower than a straightforward argumentsslice()
. (I also tried the "inline slice" strategy with a loop that is sometimes recommended, it made absolutely no difference). The differences are significant, especially forparallel
:Benchmark Results (click to expand)
We're also competitive with Bluebird again:
Bluebird benchmarks (click to expand)
It might be worth getting rid of
rest()
completely, but the gains will be less.Nothing beats native
...rest
params in ES6, though...