Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Avoid async method delegate allocation #14178

Merged
merged 1 commit into from
Sep 27, 2017

Conversation

stephentoub
Copy link
Member

Previously when a task-returning async method would yield for the first time, there would be four allocations: the task, the state machine object boxed to the heap, a context "runner" object, and a delegate that points to the boxed state machine's MoveNext method. A recent PR (#13105) changed this to avoid the separate box object and the runner, but that still left the task and the delegate.

This PR avoids the delegate as well in a common case. For async methods that only ever await Task/Task`1, that aren't using a custom sync context/scheduler, and for which tracing isn't enabled, we know the inner workings of both the builder and the awaiter and can thus bypass the awaiter's pattern APIs; instead of creating the delegate that gets passed to the awaiter and then stored in the wrapped task's continuation slot/list, we can instead just store the boxed state machine directly in the slot/list.

As a simple example just to highlight the allocation difference:

using System;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    static async Task Main()
    {
        for (int i = 0; i < 1000; i++)
        {
            await SomeMethod();
        }
    }

    static async Task SomeMethod()
    {
        await Task.Run(() => Thread.Sleep(1));
    }
}

Before:
image

After:
image

cc: @kouvel, @tarekgh, @jkotas

@AndyAyersMS, I had to workaround #12877 and https://github.com/dotnet/coreclr/issues/14177, and the workaround for #12877 isn't stellar so I'll be happy to undo it once that issue is addressed.

@benaadams, it'd be good to know if/how this affects your scenarios.

@benaadams
Copy link
Member

😸

@tarekgh
Copy link
Member

tarekgh commented Sep 25, 2017

LGTM.

@benaadams
Copy link
Member

Will take me a little while to get results

@benaadams
Copy link
Member

benaadams commented Sep 25, 2017

I've seen TaskCompletionSource<bool> used for true/false and TaskCompletionSource<object> used with object to create an non-generic Task to await e.g. Kestrel.Core/Internal/HttpConnectionMiddleware or q=TaskCompletionSource; what awaiter would that be presented as?

Would it be worth adding an check for them (bool/object)? Or should they be changed to TaskCompletionSource<int> for example?

@stephentoub
Copy link
Member Author

stephentoub commented Sep 26, 2017

I can add paths for Task<bool> and Task<object>, but they're already generally handled by the InterfaceIsCheckWorkaround checks, and what really matters it the type that's being awaited: it doesn't matter if TCS<bool> or TCS<object> is used, if what's passed back is just Task, then it's covered by the checks for typeof(TaskAwaiter) and typeof(ConfiguredTaskAwaiter).

@benaadams
Copy link
Member

Would probably just be Task<bool> then for Task<bool> UpdateStateAsync()

e.g. in Async Streams proposal

Task<bool> MoveNextAsync();

or

Task<bool> WaitForNextAsync();

@stephentoub
Copy link
Member Author

Would probably just be Task then

I'll add it. I'm hoping all of the special-casing goes away once #12877 is addressed, and in the meantime we still avoid the delegate allocation for all tasks, regardless of TResult, the IL just isn't as good.

Previously when a task-returning async method would yield for the first time, there would be four allocations: the task, the state machine object boxed to the heap, a context "runner" object, and a delegate that points to the boxed state machine's MoveNext method.  A recent PR changed this to avoid the separate box object and the runner, but that still left the task and the delegate.

This PR avoids the delegate as well in a common case.  For async methods that only ever await Task/Task`1, that aren't using a custom sync context/scheduler, and for which tracing isn't enabled, we know the inner workings of both the builder and the awaiter and can thus bypass the awaiter's pattern APIs; instead of creating the delegate that gets passed to the awaiter and then stored in the wrapped task's continuation slot/list, we can instead just store the boxed state machine directly in the slot/list.
@benaadams
Copy link
Member

This is really good! (Trace - after commenting out ETW causality allocations)

Allocators 2 & 3 should be gone post dotnet/corefx#23715

Drilled in an remaining Action is from a custom awaiter; so is expected.

Need to talk to @davidfowl about those BufferSegment allocations - now my primary allocation source!

@davidfowl
Copy link
Member

Damn BufferSegment 😄 /cc @pakrym

@benaadams
Copy link
Member

@stephentoub @vancem is there a flag for Prefview that doesn't switch on TplEtwProvider.Log and AsyncCausalityTracer.LoggingOn for just getting allocations (i.e. without allocating)?

@vancem
Copy link

vancem commented Sep 26, 2017

@benaadams

@stephentoub @vancem is there a flag for Prefview that doesn't switch on TplEtwProvider.Log and AsyncCausalityTracer.LoggingOn for just getting allocations (i.e. without allocating)?

In general if you provide new arguments to a provider using the /providers qualifier and those will override anything that PerfView did by default. PerfView has a short name for the TPL provider called .NETTasks. Thus

PerfView perfVIew /providers=.NETTasks:0:0 collect

will do a normal PerfView collection but without the TPL events (and therefor any allocations with them.
You can also use the 'Additional Providers' textbox in the GUI to achieve the same effect.

@benaadams
Copy link
Member

Thanks @vancem adding .NETTasks:0:0 to the 'Additional Providers' textbox worked

2.0

dotnet/master

PR

Only downside is the class names are getting longer :)

@stephentoub
Copy link
Member Author

stephentoub commented Sep 26, 2017

This is really good!

Excellent :) Thanks for doing all of these runs.

Only downside is the class names are getting longer :)

Seems like that's mainly an artifact of how PerfView is rendering these, e.g. when the type is showing up standalone, it's just showing the short name of the type, but when it's showing up as the generic parameter of another type, the full name is being used.

@stephentoub stephentoub merged commit ef08c82 into dotnet:master Sep 27, 2017
@stephentoub stephentoub deleted the async_avoid_delegate branch September 27, 2017 02:39
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Sep 27, 2017
…delegate

Avoid async method delegate allocation

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request Sep 27, 2017
…delegate

Avoid async method delegate allocation

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@AndyAyersMS
Copy link
Member

@stephentoub when you get a chance, point me at the code you wanted to write here, and let me see what else the jit needs to do...

@stephentoub
Copy link
Member Author

stephentoub commented Oct 10, 2017

@stephentoub when you get a chance, point me at the code you wanted to write here, and let me see what else the jit needs to do...

Thanks, @AndyAyersMS. Two things.

  1. More efficient handling of TaskAwaiters. In this code:
    https://github.com/dotnet/coreclr/pull/14178/files#diff-bc8ce62cfb625ddcc19d1f21b5c8b5c1R396
    rather than special-casing a bunch of concrete type parameters, and then in this code:
    https://github.com/dotnet/coreclr/pull/14178/files#diff-bc8ce62cfb625ddcc19d1f21b5c8b5c1R457
    rather than having this hacky InterfaceIsCheckWorkaround workaround, I want to be able to simply write:
if (awaiter is ITaskAwaiter)
{
    ref TaskAwaiter ta = ref Unsafe.As<TAwaiter, TaskAwaiter>(ref awaiter);
    TaskAwaiter.UnsafeOnCompletedInternal(ta.m_task, box, continueOnCapturedContext: true);
}

and have the JIT generate efficient code for that. The non-generic TaskAwaiter struct and the generic TaskAwaiter<TResult> struct both implement the marker interface ITaskAwaiter purely to be able to be recognized by this check. As this generic method is parameterized by the type of the awaiter, the JIT should be able to generate efficient code here, removing all casting and boxing and ending up with the entire method essentially just being that UnsafeOnCompletedInternal line.

  1. Handling of all ValueTaskAwaiter<TResult>s. With TaskAwaiter and TaskAwaiter<TResult> in (1) above, I'm able to use Unsafe.As as a trick to treat them all as TaskAwaiter and then access the m_task field on TaskAwaiter, due to both TaskAwaiter and TaskAwaiter<TResult> having the same layout. I can't play the same trick with ValueTaskAwaiter<TResult>, as ValueTask<TResult> (which a ValueTaskAwaiter<TResult> wraps) is a discrimated union of a TResult and a Task<TResult>, which means not all ValueTask<TResult>s have the same layout/size. Currently that means this method just special-cases a few common generic type parameters:
    https://github.com/dotnet/coreclr/pull/14178/files#diff-bc8ce62cfb625ddcc19d1f21b5c8b5c1R427
    which is not only klunky, but it also means that any other type parameters used with ValueTask<TResult> won't get this optimization. Instead, I'd like to be able to add an internal IValueTaskAwaiter interface that ValueTaskAwaiter<TResult> would implement, and which would have a single Task GetTask() method on it, at which point I can write code like:
if (awaiter is IValueTaskAwaiter)
{
    Task task = ((IValueTaskAwaiter)awaiter).GetTask();
    TaskAwaiter.UnsafeOnCompletedInternal(task, box, continueOnCapturedContext: true);
}

I can write that today, but it incurs boxing both for both casts, and to implement this optimization I need to be able to extract the Task allocation-free, so neither the is or the cast can box... essentially I'd want the is check to completely evaporate at compile-time and the GetTask invocation to behave as if it were a constrained call on a generic type parameter constrained to that interface.
https://github.com/dotnet/coreclr/issues/14213

In fact, if I could write code like (2), and if the interface call could be inlined, then I could just use the same interface on all of the TaskAwaiters and ValueTaskAwaiters and just have the code from (2) rather than both (1) and (2).

AndyAyersMS added a commit to AndyAyersMS/coreclr that referenced this pull request Oct 13, 2017
Implement the jit interface compareTypesForEquality method
to handle casts from known types to known types, and from
shared types to certain interface types.

Call this method in the jit for castclass and isinst, using
`gtGetClassHandle` to obtain the from type. Optimize sucessful
casts and unsuccessful isinsts when the from type is known
exactly.

Undo part of the type-equality based optimization/workaround
in the AsyncMethodBuilder code that was introduced in dotnet#14178
in favor of interface checks. There is more here that can
be done here before this issue is entirely closed and I will
look at this subsequently.

This optimization allows the jit to remove boxes that are
used solely to feed type casts, and so closes #12877.
AndyAyersMS added a commit that referenced this pull request Oct 13, 2017
JIT: optimize type casts

Implement the jit interface compareTypesForEquality method
to handle casts from known types to known types, and from
shared types to certain interface types.

Call this method in the jit for castclass and isinst, using
`gtGetClassHandle` to obtain the from type. Optimize sucessful
casts and unsuccessful isinsts when the from type is known
exactly.

Undo part of the type-equality based optimization/workaround
in the AsyncMethodBuilder code that was introduced in #14178
in favor of interface checks. There is more here that can
be done here before this issue is entirely closed and I will
look at this subsequently.

This optimization allows the jit to remove boxes that are
used solely to feed type casts, and so closes #12877.
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
…delegate

Avoid async method delegate allocation

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
…delegate

Avoid async method delegate allocation

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
…delegate

Avoid async method delegate allocation

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
…delegate

Avoid async method delegate allocation

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants