-
Notifications
You must be signed in to change notification settings - Fork 371
Avoid Task.Run and Factory.StartNew with async #581
Avoid Task.Run and Factory.StartNew with async #581
Conversation
…he async best practices
… and DataServicesResponseAdapterMessage
@erezvani1529 here we go |
@erezvani1529 is there anything left I need to do so that this can get merged? |
Hi @danielmarbach, We have a 3 sign-off criteria. My team needs to sign-off on this PR before we can merge but I will make sure we take this change in for our next breaking release(current release train). |
@erezvani1529 please do not bundle with braking release trains. Why not publish incremental 8.6.1 version? Bundling with breaking changes mean that in order to get this performance gains I would need to absorb breaking changes too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment applies to all cases of PostProcessResponse methods touched by this change where the internal logic for instance ReadServiceStats does a synchronous read on a network stream where this change will lead to blocking the calling thread until the read is completed. Spinning up an extra thread here, although not a good solution, could still be considered better than blocking.
The good news is, we have a set of changes which is going to fix this issue by making the PostProcess actually async to enable us take async all the way down in the parsers.
So we suggest to hold off on this section of the changes and we will address this with our pending set of changes to make everything async all the way through. I will create an issue to track this task.
@@ -559,7 +559,7 @@ private RESTCommand<ServiceStats> GetServiceStatsImpl(BlobRequestOptions request | |||
retCmd.BuildRequest = (cmd, uri, builder, cnt, serverTimeout, ctx) => BlobHttpRequestMessageFactory.GetServiceStats(uri, serverTimeout, ctx, this.GetCanonicalizer(), this.Credentials); | |||
retCmd.RetrieveResponseStream = true; | |||
retCmd.PreProcessResponse = (cmd, resp, ex, ctx) => HttpResponseParsers.ProcessExpectedStatusCodeNoException(HttpStatusCode.OK, resp, null /* retVal */, cmd, ex); | |||
retCmd.PostProcessResponse = (cmd, resp, ctx) => Task.Factory.StartNew(() => BlobHttpResponseParsers.ReadServiceStats(cmd.ResponseStream)); | |||
retCmd.PostProcessResponse = (cmd, resp, ctx) => Task.FromResult(BlobHttpResponseParsers.ReadServiceStats(cmd.ResponseStream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment applies to all cases of PostProcessResponse methods touched by this change where the internal logic for instance ReadServiceStats does a synchronous read on a network stream where this change will lead to blocking the calling thread until the read is completed. Spinning up an extra thread here, although not a good solution, could still be considered better than blocking.
The good news is, we have a set of changes which is going to fix this issue by making the PostProcess actually async to enable us take async all the way down in the parsers.
So we suggest to hold off on this section of the changes and we will address this with our pending set of changes to make everything async all the way through. I will create an issue to track this task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erezvani1529 Please allow me to kindly disagree here for a moment. If you look into the executor class workflow you will see that PostProcessResponse
func is called after several asynchronous continuations have already been scheduled and run. For example await client.SendAsync
comes first on the HttpClient and then await executionState.Resp.Content.ReadAsStreamAsync()
on the HttpContent
and I only took into account the happy path. Since we consistently apply ConfigureAwait(false)
the continuations will be rescheduled on the TaskScheduler.Default
and can freely use the ThreadPool available. So there is no blocking happing and it is perfectly valid to execute synchronous blocking code in a continuation. In the future, if the truly async reading there is supported the code would be even higher optimized. But rescheduling unnecessary again to the thread pool explicitly inside a deep continuation of an already asynchronous call stack is only putting unnecessary pressure on the threadpool without achieving much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional comment. I get that for example the streaming API in .NET Core uses the value task. So in case the stream is already materialized it could synchronously complete. But even in that case you'd had other continuations scheduled before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this was my concern, but you're right, we should no longer be on the calling thread. Thanks for doing all this!
@@ -95,7 +95,7 @@ public override int Read(byte[] buffer, int offset, int count) | |||
/// <param name="count">The maximum number of bytes to read.</param> | |||
/// <param name="cancellationToken">The token to monitor for cancellation requests.</param> | |||
/// <returns>A task that represents the asynchronous read operation.</returns> | |||
public override async Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) | |||
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the slight change of behavior here by eliding the async keyword gaining us much perf improvement? other than skipping the internal state machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erezvani1529 there should be no change in behaviour, solely performance gain.
If all the method returns is a task, rather than awaiting for the method and for the operation in the method, only a single await (state machine) is involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am referring to the change of behavior with Exceptions and where to expect them, when the method call and await are separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception will occur where you await the task, if I understood your question correctly. As Task returned by a method is not executed until awaited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true if the async keyword exists or the method call and await aren't separated. Stephen Cleary has a great blog-post about this as well, which could be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erezvani1529 regarding your comment in #581 (comment)
Eliding the keyword on the hot path means you can save the extra allocations of the statemachine display class. .NET 2.1 will contain improvements that will optimize the statemachine slightly see dotnet/coreclr#14178 and dotnet/coreclr#13105 but still there is allocation overhead.
See for example
BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-4980HQ CPU 2.80GHz (Haswell), ProcessorCount=8
Frequency=2728068 Hz, Resolution=366.5598 ns, Timer=TSC
[Host] : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2115.0
DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2115.0
Method | Mean | Error | StdDev | StdErr | Min | Q1 | Median | Q3 | Max | Op/s | Scaled | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|
WithStatemachine | 15.61 ms | 0.0166 ms | 0.0155 ms | 0.0040 ms | 15.58 ms | 15.60 ms | 15.61 ms | 15.62 ms | 15.63 ms | 64.06 | 1.00 | 640 B |
WithoutStatemachine | 15.61 ms | 0.0128 ms | 0.0120 ms | 0.0031 ms | 15.59 ms | 15.60 ms | 15.61 ms | 15.62 ms | 15.63 ms | 64.06 | 1.00 | 384 B |
BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-4980HQ CPU 2.80GHz (Haswell), ProcessorCount=8
Frequency=2728068 Hz, Resolution=366.5598 ns, Timer=TSC
[Host] : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2115.0
Job-YTNNJT : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
Toolchain=.NET Core 2.0
Method | Mean | Error | StdDev | StdErr | Min | Q1 | Median | Q3 | Max | Op/s | Scaled | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|
WithStatemachine | 15.61 ms | 0.0155 ms | 0.0138 ms | 0.0037 ms | 15.58 ms | 15.60 ms | 15.60 ms | 15.61 ms | 15.63 ms | 64.08 | 1.00 | 528 B |
WithoutStatemachine | 15.60 ms | 0.0186 ms | 0.0174 ms | 0.0045 ms | 15.57 ms | 15.59 ms | 15.60 ms | 15.62 ms | 15.63 ms | 64.08 | 1.00 | 312 B |
As you can imagine the allocations here would add up pretty quickly.
It would be possible to unbundle those things from the PR. but I do think it is worth keeping.
🎆 |
One more approval, just one more... |
Any ideas when these changes will be pushed out in a point release? We could use these changes yesterday :) |
Replaces #579
When doing performance investigations with ASQ on .NET Core I stumbled upon the improper usage of
Task.Run
andTask.Factory.StartNew
around pure async code.Task.Run
andTask.Factory.StartNew
is for compute-bound code and not for IO-bound.Without explicitly forcing those operations unnecessarily to the thread pool the code will not suffer thread pool ramp up and drastically improve performance under highly concurrent asynchronous runs