Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

Avoid Task.Run and Factory.StartNew with async #581

Merged
merged 9 commits into from
Dec 21, 2017

Conversation

danielmarbach
Copy link
Contributor

@danielmarbach danielmarbach commented Nov 28, 2017

Replaces #579

When doing performance investigations with ASQ on .NET Core I stumbled upon the improper usage of Task.Run and Task.Factory.StartNew around pure async code.

Task.Run and Task.Factory.StartNew is for compute-bound code and not for IO-bound.

Without explicitly forcing those operations unnecessarily to the thread pool the code will not suffer thread pool ramp up and drastically improve performance under highly concurrent asynchronous runs

@danielmarbach danielmarbach changed the title Performance breaking Avoid Task.Run and Factory.StartNew with async Nov 28, 2017
@danielmarbach
Copy link
Contributor Author

@erezvani1529 here we go

@danielmarbach
Copy link
Contributor Author

@erezvani1529 is there anything left I need to do so that this can get merged?

@erezvani1529
Copy link
Contributor

Hi @danielmarbach,

We have a 3 sign-off criteria. My team needs to sign-off on this PR before we can merge but I will make sure we take this change in for our next breaking release(current release train).

@IlyaGrebnov
Copy link

@erezvani1529 please do not bundle with braking release trains. Why not publish incremental 8.6.1 version? Bundling with breaking changes mean that in order to get this performance gains I would need to absorb breaking changes too.

Copy link
Contributor

@erezvani1529 erezvani1529 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to all cases of PostProcessResponse methods touched by this change where the internal logic for instance ReadServiceStats does a synchronous read on a network stream where this change will lead to blocking the calling thread until the read is completed. Spinning up an extra thread here, although not a good solution, could still be considered better than blocking.

The good news is, we have a set of changes which is going to fix this issue by making the PostProcess actually async to enable us take async all the way down in the parsers.

So we suggest to hold off on this section of the changes and we will address this with our pending set of changes to make everything async all the way through. I will create an issue to track this task.

@erezvani1529 erezvani1529 dismissed their stale review December 6, 2017 23:07

replacing by a comment

@@ -559,7 +559,7 @@ private RESTCommand<ServiceStats> GetServiceStatsImpl(BlobRequestOptions request
retCmd.BuildRequest = (cmd, uri, builder, cnt, serverTimeout, ctx) => BlobHttpRequestMessageFactory.GetServiceStats(uri, serverTimeout, ctx, this.GetCanonicalizer(), this.Credentials);
retCmd.RetrieveResponseStream = true;
retCmd.PreProcessResponse = (cmd, resp, ex, ctx) => HttpResponseParsers.ProcessExpectedStatusCodeNoException(HttpStatusCode.OK, resp, null /* retVal */, cmd, ex);
retCmd.PostProcessResponse = (cmd, resp, ctx) => Task.Factory.StartNew(() => BlobHttpResponseParsers.ReadServiceStats(cmd.ResponseStream));
retCmd.PostProcessResponse = (cmd, resp, ctx) => Task.FromResult(BlobHttpResponseParsers.ReadServiceStats(cmd.ResponseStream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to all cases of PostProcessResponse methods touched by this change where the internal logic for instance ReadServiceStats does a synchronous read on a network stream where this change will lead to blocking the calling thread until the read is completed. Spinning up an extra thread here, although not a good solution, could still be considered better than blocking.

The good news is, we have a set of changes which is going to fix this issue by making the PostProcess actually async to enable us take async all the way down in the parsers.

So we suggest to hold off on this section of the changes and we will address this with our pending set of changes to make everything async all the way through. I will create an issue to track this task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erezvani1529 Please allow me to kindly disagree here for a moment. If you look into the executor class workflow you will see that PostProcessResponse func is called after several asynchronous continuations have already been scheduled and run. For example await client.SendAsync comes first on the HttpClient and then await executionState.Resp.Content.ReadAsStreamAsync() on the HttpContent and I only took into account the happy path. Since we consistently apply ConfigureAwait(false) the continuations will be rescheduled on the TaskScheduler.Default and can freely use the ThreadPool available. So there is no blocking happing and it is perfectly valid to execute synchronous blocking code in a continuation. In the future, if the truly async reading there is supported the code would be even higher optimized. But rescheduling unnecessary again to the thread pool explicitly inside a deep continuation of an already asynchronous call stack is only putting unnecessary pressure on the threadpool without achieving much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional comment. I get that for example the streaming API in .NET Core uses the value task. So in case the stream is already materialized it could synchronously complete. But even in that case you'd had other continuations scheduled before.

https://twitter.com/davidfowl/status/750580955564367872

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this was my concern, but you're right, we should no longer be on the calling thread. Thanks for doing all this!

@@ -95,7 +95,7 @@ public override int Read(byte[] buffer, int offset, int count)
/// <param name="count">The maximum number of bytes to read.</param>
/// <param name="cancellationToken">The token to monitor for cancellation requests.</param>
/// <returns>A task that represents the asynchronous read operation.</returns>
public override async Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the slight change of behavior here by eliding the async keyword gaining us much perf improvement? other than skipping the internal state machine?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erezvani1529 there should be no change in behaviour, solely performance gain.
If all the method returns is a task, rather than awaiting for the method and for the operation in the method, only a single await (state machine) is involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am referring to the change of behavior with Exceptions and where to expect them, when the method call and await are separated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception will occur where you await the task, if I understood your question correctly. As Task returned by a method is not executed until awaited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true if the async keyword exists or the method call and await aren't separated. Stephen Cleary has a great blog-post about this as well, which could be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erezvani1529 regarding your comment in #581 (comment)

Eliding the keyword on the hot path means you can save the extra allocations of the statemachine display class. .NET 2.1 will contain improvements that will optimize the statemachine slightly see dotnet/coreclr#14178 and dotnet/coreclr#13105 but still there is allocation overhead.

See for example

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-4980HQ CPU 2.80GHz (Haswell), ProcessorCount=8
Frequency=2728068 Hz, Resolution=366.5598 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2115.0
  DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2115.0

Method Mean Error StdDev StdErr Min Q1 Median Q3 Max Op/s Scaled Allocated
WithStatemachine 15.61 ms 0.0166 ms 0.0155 ms 0.0040 ms 15.58 ms 15.60 ms 15.61 ms 15.62 ms 15.63 ms 64.06 1.00 640 B
WithoutStatemachine 15.61 ms 0.0128 ms 0.0120 ms 0.0031 ms 15.59 ms 15.60 ms 15.61 ms 15.62 ms 15.63 ms 64.06 1.00 384 B
BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-4980HQ CPU 2.80GHz (Haswell), ProcessorCount=8
Frequency=2728068 Hz, Resolution=366.5598 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2115.0
  Job-YTNNJT : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Toolchain=.NET Core 2.0  
Method Mean Error StdDev StdErr Min Q1 Median Q3 Max Op/s Scaled Allocated
WithStatemachine 15.61 ms 0.0155 ms 0.0138 ms 0.0037 ms 15.58 ms 15.60 ms 15.60 ms 15.61 ms 15.63 ms 64.08 1.00 528 B
WithoutStatemachine 15.60 ms 0.0186 ms 0.0174 ms 0.0045 ms 15.57 ms 15.59 ms 15.60 ms 15.62 ms 15.63 ms 64.08 1.00 312 B

As you can imagine the allocations here would add up pretty quickly.

It would be possible to unbundle those things from the PR. but I do think it is worth keeping.

@danielmarbach
Copy link
Contributor Author

🎆

@danielmarbach
Copy link
Contributor Author

One more approval, just one more...

@danielmarbach
Copy link
Contributor Author

@erezvani1529 erezvani1529 merged commit d11b34b into Azure:dev_breaking Dec 21, 2017
@niemyjski
Copy link

Any ideas when these changes will be pushed out in a point release? We could use these changes yesterday :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants