-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Avoid async method delegate allocation #14178
Conversation
😸 |
LGTM. |
Will take me a little while to get results |
I've seen Would it be worth adding an check for them ( |
I can add paths for |
Would probably just be e.g. in Async Streams proposal Task<bool> MoveNextAsync(); or Task<bool> WaitForNextAsync(); |
I'll add it. I'm hoping all of the special-casing goes away once #12877 is addressed, and in the meantime we still avoid the delegate allocation for all tasks, regardless of TResult, the IL just isn't as good. |
Previously when a task-returning async method would yield for the first time, there would be four allocations: the task, the state machine object boxed to the heap, a context "runner" object, and a delegate that points to the boxed state machine's MoveNext method. A recent PR changed this to avoid the separate box object and the runner, but that still left the task and the delegate. This PR avoids the delegate as well in a common case. For async methods that only ever await Task/Task`1, that aren't using a custom sync context/scheduler, and for which tracing isn't enabled, we know the inner workings of both the builder and the awaiter and can thus bypass the awaiter's pattern APIs; instead of creating the delegate that gets passed to the awaiter and then stored in the wrapped task's continuation slot/list, we can instead just store the boxed state machine directly in the slot/list.
0a3aaa0
to
9a06301
Compare
This is really good! (Trace - after commenting out ETW causality allocations) Allocators 2 & 3 should be gone post dotnet/corefx#23715 Drilled in an remaining Action is from a custom awaiter; so is expected. Need to talk to @davidfowl about those |
Damn BufferSegment 😄 /cc @pakrym |
@stephentoub @vancem is there a flag for Prefview that doesn't switch on |
In general if you provide new arguments to a provider using the /providers qualifier and those will override anything that PerfView did by default. PerfView has a short name for the TPL provider called .NETTasks. Thus PerfView perfVIew /providers=.NETTasks:0:0 collect will do a normal PerfView collection but without the TPL events (and therefor any allocations with them. |
Thanks @vancem adding Only downside is the class names are getting longer :) |
Excellent :) Thanks for doing all of these runs.
Seems like that's mainly an artifact of how PerfView is rendering these, e.g. when the type is showing up standalone, it's just showing the short name of the type, but when it's showing up as the generic parameter of another type, the full name is being used. |
…delegate Avoid async method delegate allocation Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
…delegate Avoid async method delegate allocation Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@stephentoub when you get a chance, point me at the code you wanted to write here, and let me see what else the jit needs to do... |
Thanks, @AndyAyersMS. Two things.
if (awaiter is ITaskAwaiter)
{
ref TaskAwaiter ta = ref Unsafe.As<TAwaiter, TaskAwaiter>(ref awaiter);
TaskAwaiter.UnsafeOnCompletedInternal(ta.m_task, box, continueOnCapturedContext: true);
} and have the JIT generate efficient code for that. The non-generic
if (awaiter is IValueTaskAwaiter)
{
Task task = ((IValueTaskAwaiter)awaiter).GetTask();
TaskAwaiter.UnsafeOnCompletedInternal(task, box, continueOnCapturedContext: true);
} I can write that today, but it incurs boxing both for both casts, and to implement this optimization I need to be able to extract the Task allocation-free, so neither the In fact, if I could write code like (2), and if the interface call could be inlined, then I could just use the same interface on all of the |
Implement the jit interface compareTypesForEquality method to handle casts from known types to known types, and from shared types to certain interface types. Call this method in the jit for castclass and isinst, using `gtGetClassHandle` to obtain the from type. Optimize sucessful casts and unsuccessful isinsts when the from type is known exactly. Undo part of the type-equality based optimization/workaround in the AsyncMethodBuilder code that was introduced in dotnet#14178 in favor of interface checks. There is more here that can be done here before this issue is entirely closed and I will look at this subsequently. This optimization allows the jit to remove boxes that are used solely to feed type casts, and so closes #12877.
JIT: optimize type casts Implement the jit interface compareTypesForEquality method to handle casts from known types to known types, and from shared types to certain interface types. Call this method in the jit for castclass and isinst, using `gtGetClassHandle` to obtain the from type. Optimize sucessful casts and unsuccessful isinsts when the from type is known exactly. Undo part of the type-equality based optimization/workaround in the AsyncMethodBuilder code that was introduced in #14178 in favor of interface checks. There is more here that can be done here before this issue is entirely closed and I will look at this subsequently. This optimization allows the jit to remove boxes that are used solely to feed type casts, and so closes #12877.
…delegate Avoid async method delegate allocation Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
…delegate Avoid async method delegate allocation Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
…delegate Avoid async method delegate allocation Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
…delegate Avoid async method delegate allocation Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
Previously when a task-returning async method would yield for the first time, there would be four allocations: the task, the state machine object boxed to the heap, a context "runner" object, and a delegate that points to the boxed state machine's MoveNext method. A recent PR (#13105) changed this to avoid the separate box object and the runner, but that still left the task and the delegate.
This PR avoids the delegate as well in a common case. For async methods that only ever await Task/Task`1, that aren't using a custom sync context/scheduler, and for which tracing isn't enabled, we know the inner workings of both the builder and the awaiter and can thus bypass the awaiter's pattern APIs; instead of creating the delegate that gets passed to the awaiter and then stored in the wrapped task's continuation slot/list, we can instead just store the boxed state machine directly in the slot/list.
As a simple example just to highlight the allocation difference:
Before:
After:
cc: @kouvel, @tarekgh, @jkotas
@AndyAyersMS, I had to workaround #12877 and https://github.com/dotnet/coreclr/issues/14177, and the workaround for #12877 isn't stellar so I'll be happy to undo it once that issue is addressed.
@benaadams, it'd be good to know if/how this affects your scenarios.