Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly improve the performance of EventLoopFuture.{and,when}AllSucceed. #943

Merged
merged 5 commits into from
Apr 5, 2019

Conversation

MrMage
Copy link
Contributor

@MrMage MrMage commented Apr 3, 2019

~2.7x-12x speedups for EventLoopFuture.{and,when}AllSucceed.

Motivation:

Projects like Vapor (e.g. https://github.com/vapor/template-kit/blob/master/Sources/TemplateKit/Pipeline/TemplateSerializer.swift) often need to process large arrays of futures where (almost) all futures have already been fulfilled. Such frameworks will benefit tremendously from any possible performance improvement to EventLoopFuture.{and,when}AllSucceed.

Modifications:

  • Rewrote {and,when}AllSucceed with inspiration from {and,when}AllComplete.
  • Added tests for whenAllSucceed.
  • Added benchmarks for whenAllSucceed and completeAllSucceed.

Result:

Speedups:

future_whenallsucceed_100k_immediately_succeeded_off_loop: 1.6799349784851074 / 0.13164806365966797 = 12.77
future_whenallsucceed_100k_immediately_succeeded_on_loop: 0.5427179336547852 / 0.14923691749572754 = 3.64
future_whenallsucceed_100k_deferred_off_loop: 2.1837170124053955 / 0.8010389804840088 = 2.73
future_whenallsucceed_100k_deferred_on_loop: 0.542646050453186 / 0.2016160488128662 = 2.69

Original runtimes on a Core i7-6700k:

measuring: future_whenallsucceed_100k_immediately_succeeded_off_loop: 1.6799349784851074, 1.7866089344024658, 1.7765229940414429, 1.7685890197753906, 1.5157029628753662, 1.642738938331604, 1.6204090118408203, 1.5972599983215332, 1.487981915473938, 1.5103060007095337, 
measuring: future_whenallsucceed_100k_immediately_succeeded_on_loop: 0.5427179336547852, 0.5418450832366943, 0.5492779016494751, 0.5393710136413574, 0.5387519598007202, 0.5326310396194458, 0.5234640836715698, 0.5387870073318481, 0.5265940427780151, 0.5305559635162354, 
measuring: future_whenallsucceed_100k_deferred_off_loop: 2.1837170124053955, 2.3039519786834717, 2.1825860738754272, 2.3145300149917603, 2.4204330444335938, 2.4434269666671753, 2.2526910305023193, 2.2747159004211426, 2.1828880310058594, 2.404868006706238, 
measuring: future_whenallsucceed_100k_deferred_on_loop: 0.542646050453186, 0.6132320165634155, 0.5573389530181885, 0.5628399848937988, 0.5481549501419067, 0.5543719530105591, 0.5560970306396484, 0.5577199459075928, 0.5452660322189331, 0.5439059734344482, 
measuring: future_whenallcomplete_100k_immediately_succeeded_off_loop: 0.14835500717163086, 0.14790797233581543, 0.1478050947189331, 0.14823198318481445, 0.14922690391540527, 0.14710009098052979, 0.14890599250793457, 0.14921092987060547, 0.14850103855133057, 0.1489109992980957, 
measuring: future_whenallcomplete_100k_immediately_succeeded_on_loop: 0.15264499187469482, 0.14969193935394287, 0.14757001399993896, 0.1522080898284912, 0.14965295791625977, 0.15176701545715332, 0.1497650146484375, 0.1475529670715332, 0.14810407161712646, 0.15112793445587158, 
measuring: future_whenallcomplete_100k_deferred_off_loop: 0.9009320735931396, 0.8606759309768677, 0.8378961086273193, 0.8444629907608032, 0.8646301031112671, 0.8610509634017944, 0.8540569543838501, 0.8730940818786621, 0.904228925704956, 0.8585549592971802, 
measuring: future_whenallcomplete_100k_deferred_on_loop: 0.16903400421142578, 0.16870009899139404, 0.1682649850845337, 0.17055797576904297, 0.17044198513031006, 0.17038702964782715, 0.16999197006225586, 0.16937005519866943, 0.17072999477386475, 0.17130303382873535, 

Runtimes without the fast-path for already-completed futures:

measuring: future_whenallsucceed_100k_immediately_succeeded_off_loop: 0.17186903953552246, 0.16920197010040283, 0.170989990234375, 0.17129099369049072, 0.17086505889892578, 0.17077898979187012, 0.16939306259155273, 0.17117297649383545, 0.17174100875854492, 0.16971802711486816, 
measuring: future_whenallsucceed_100k_immediately_succeeded_on_loop: 0.1687079668045044, 0.17134392261505127, 0.1697540283203125, 0.17058801651000977, 0.172510027885437, 0.1710280179977417, 0.16765403747558594, 0.1700659990310669, 0.17064499855041504, 0.17007005214691162, 
measuring: future_whenallsucceed_100k_deferred_off_loop: 0.8068110942840576, 0.8203119039535522, 0.8302290439605713, 0.8282400369644165, 0.8519179821014404, 0.8158940076828003, 0.8139890432357788, 0.8262540102005005, 0.8299790620803833, 0.8337689638137817, 
measuring: future_whenallsucceed_100k_deferred_on_loop: 0.19305706024169922, 0.1964479684829712, 0.19446706771850586, 0.19203698635101318, 0.19434797763824463, 0.19169199466705322, 0.1933000087738037, 0.19249296188354492, 0.19386506080627441, 0.1929779052734375, 

(Values for whenallcomplete omitted because those methods have not changed.)

Runtimes with the fast-path for already-completed futures:

measuring: future_whenallsucceed_100k_immediately_succeeded_off_loop: 0.13164806365966797, 0.13887691497802734, 0.1480560302734375, 0.14489400386810303, 0.13263404369354248, 0.13490605354309082, 0.13424301147460938, 0.13548588752746582, 0.13828301429748535, 0.16892695426940918, 
measuring: future_whenallsucceed_100k_immediately_succeeded_on_loop: 0.14923691749572754, 0.15650200843811035, 0.13460004329681396, 0.13374602794647217, 0.13292694091796875, 0.148590087890625, 0.14843904972076416, 0.13741397857666016, 0.13701403141021729, 0.13629603385925293, 
measuring: future_whenallsucceed_100k_deferred_off_loop: 0.8010389804840088, 0.8186479806900024, 0.815155029296875, 0.8011970520019531, 0.7963169813156128, 0.8042939901351929, 0.8173959255218506, 0.7985730171203613, 0.8387489318847656, 0.8120409250259399, 
measuring: future_whenallsucceed_100k_deferred_on_loop: 0.2016160488128662, 0.2044239044189453, 0.1832129955291748, 0.21035611629486084, 0.20476889610290527, 0.20411407947540283, 0.20260095596313477, 0.2056729793548584, 0.20184290409088135, 0.2035679817199707, 

@MrMage
Copy link
Contributor Author

MrMage commented Apr 3, 2019

@swift-nio-bot test this please

@MrMage
Copy link
Contributor Author

MrMage commented Apr 3, 2019

Happy to backport this to NIO1 so that Vapor 3 can benefit, by the way.

@MrMage
Copy link
Contributor Author

MrMage commented Apr 3, 2019

Also, thanks to @Mordil for providing a great example with _reduceCompletions0.

Copy link
Contributor

@Mordil Mordil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it to the NIO team to make any real decisions - but I left some thoughts on these changes.

Do you plan to add the fast-track path for {and,when}AllComplete?

Sources/NIO/EventLoopFuture.swift Outdated Show resolved Hide resolved
@@ -1044,7 +1054,83 @@ extension EventLoopFuture {
/// - on: The `EventLoop` on which the new `EventLoopFuture` callbacks will fire.
/// - Returns: A new `EventLoopFuture` with all of the values fulfilled by the provided futures.
public static func whenAllSucceed(_ futures: [EventLoopFuture<Value>], on eventLoop: EventLoop) -> EventLoopFuture<[Value]> {
return .reduce(into: [], futures, on: eventLoop) { (results, value) in results.append(value) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these methods were using .reduce, and I can't help but wonder if perhaps the performance boost could have been done in that method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure reduce and fold have potential for further optimization, but I don't think their performance will come anywhere close to this bespoke implementation. For example, the argument to fold is a closure that returns a future. I.e. we'd need to create at least one extra future (several more with the current implementation of fold, actually) for each element in the array. That alone generates so much overhead that it can't match the performance of bespoke implementations.

There could be a point made about when the performance of a bespoke implementation is needed, however, Vapor & Co. process a ton of already-fulfilled futures where this is significant.

@MrMage
Copy link
Contributor Author

MrMage commented Apr 3, 2019

Do you plan to add the fast-track path for {and,when}AllComplete?

Good idea. Done. Results:

measuring: future_whenallsucceed_100k_immediately_succeeded_off_loop: 0.1378859281539917, 0.13671696186065674, 0.13865399360656738, 0.13784301280975342, 0.1380389928817749, 0.13869500160217285, 0.1374189853668213, 0.13842201232910156, 0.13793694972991943, 0.13764894008636475, 
measuring: future_whenallsucceed_100k_immediately_succeeded_on_loop: 0.13898098468780518, 0.13831603527069092, 0.13934898376464844, 0.13744008541107178, 0.1376969814300537, 0.13988101482391357, 0.13975894451141357, 0.13864600658416748, 0.13968896865844727, 0.1402740478515625, 
measuring: future_whenallsucceed_100k_deferred_off_loop: 0.8700159788131714, 0.8707640171051025, 0.8721489906311035, 0.8901969194412231, 0.8609009981155396, 0.8743259906768799, 0.8825899362564087, 0.8498630523681641, 0.8573849201202393, 0.842851996421814, 
measuring: future_whenallsucceed_100k_deferred_on_loop: 0.2021169662475586, 0.20193004608154297, 0.2004399299621582, 0.20185601711273193, 0.20127105712890625, 0.19970500469207764, 0.20102405548095703, 0.19995999336242676, 0.2022179365158081, 0.17802095413208008, 
measuring: future_whenallcomplete_100k_immediately_succeeded_off_loop: 0.1101679801940918, 0.10986900329589844, 0.10995292663574219, 0.10981106758117676, 0.10877501964569092, 0.10875797271728516, 0.10990703105926514, 0.10846090316772461, 0.10954999923706055, 0.10873901844024658, 
measuring: future_whenallcomplete_100k_immediately_succeeded_on_loop: 0.10942995548248291, 0.11032795906066895, 0.10942709445953369, 0.10890007019042969, 0.10935795307159424, 0.10984694957733154, 0.10956501960754395, 0.10968005657196045, 0.11066591739654541, 0.1102520227432251, 
measuring: future_whenallcomplete_100k_deferred_off_loop: 0.8521190881729126, 0.8377310037612915, 0.8450978994369507, 0.8614420890808105, 0.851652979850769, 0.8680590391159058, 0.8755439519882202, 0.877761960029602, 0.8509070873260498, 0.857975959777832, 
measuring: future_whenallcomplete_100k_deferred_on_loop: 0.17288005352020264, 0.17352807521820068, 0.17383790016174316, 0.17525804042816162, 0.17185401916503906, 0.17182600498199463, 0.17424309253692627, 0.1721489429473877, 0.1710139513015747, 0.1730639934539795, 

I.e. a nice 30% speedup on AllComplete as well.

@weissi weissi added the semver/minor Adds new public API. label Apr 3, 2019
@weissi weissi added this to the 2.0.1 milestone Apr 3, 2019
@weissi weissi added semver/patch No public API change. and removed semver/minor Adds new public API. labels Apr 3, 2019
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks really good! I have left a note in the diff.

Sources/NIO/EventLoopFuture.swift Outdated Show resolved Hide resolved
Copy link
Member

@weissi weissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much, even performance tests 🙌! That looks good to me.

@Lukasa Lukasa merged commit 77d9025 into apple:master Apr 5, 2019
MrMage added a commit to MrMage/swift-nio that referenced this pull request Apr 5, 2019
@PopFlamingo
Copy link
Contributor

@MrMage Does this mean we can now use again the previous implementation of Vapor’s flatten now that you pushed your changes upstream ?

@MrMage
Copy link
Contributor Author

MrMage commented Apr 6, 2019

@MrMage Does this mean we can now use again the previous implementation of Vapor’s flatten now that you pushed your changes upstream ?

Vapor's NIOKit is using NIO's whenAllSucceed and will receive these benefits "for free". Once this PR's backport to NIO 1 (#947) is merged and tagged, we can change Core's flatten method to also use NIO 1's new whenAll. I am going to take care of that.

weissi pushed a commit that referenced this pull request Apr 8, 2019
* Cherry-pick "Significantly improve the performance of `EventLoopFuture.{and,when}AllSucceed`. (#943)"

* NIO 1 fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants