Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Add sos DumpAsync command #18160

Merged
merged 1 commit into from
May 30, 2018
Merged

Add sos DumpAsync command #18160

merged 1 commit into from
May 30, 2018

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented May 29, 2018

Debugging async methods with sos can be time consuming and relies on knowing how to use dumpheap, dumpvc, gcroot, and other commands to get the desired information, while often digging through a myriad of objects on the heap to find the desired info.

This commit adds a new DumpAsync command, which finds the async state machine objects on the GC heap and outputs relevant information about each, including the fields of its state machine, any registered continuation, and GC roots for the state machine object (as they often serve as a valid substitute for call stacks).

Example program used as a test:

using System.Threading.Tasks;

class Program
{
    static async Task Main() => await MethodA();
    static async Task MethodA() => await MethodB();
    static async Task MethodB() => await MethodC();
    static async Task MethodC() => await MethodD();
    static async Task MethodD() => await Task.Delay(int.MaxValue);
}

and example command output:

0:011> !DumpAsync -type MethodD
         Address               MT     Size Name
#0
000001989f413de0 00007ff88c506ba8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]
StateMachine: Program+<MethodD>d__4 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8d3df4b80  400000d        0         System.Int32  1 instance                0 <>1__state
00007ff8d3e082c0  400000e        8 ...TaskMethodBuilder  1 instance 000001989f413e38 <>t__builder
00007ff8d3dfea90  400000f       10 ...vices.TaskAwaiter  1 instance 000001989f413e40 <>u__1
Continuation: 000001989f413e50 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]])
GC roots:
    Thread 2936c:
        000000071a37e050 00007ff8d3ac1657 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [d:\repos\coreclr\src\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 2977]
            rbp+10: 000000071a37e0c0
                ->  000001989f413fa0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
                ->  000001989f413f30 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
                ->  000001989f413ec0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
                ->  000001989f413e50 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]]
                ->  000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

    HandleTable:
        000001989d8415f8 (pinned handle)
        -> 00000198af3e1038 System.Object[]
        -> 000001989f413410 System.Threading.TimerQueue[]
        -> 000001989f413468 System.Threading.TimerQueue
        -> 000001989f413330 System.Threading.TimerQueueTimer
        -> 000001989f412e40 System.Threading.Tasks.Task+DelayPromise
        -> 000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

Found 1 state machines.

cc: @jkotas, @janvorli, @mikem8361, @noahfalk, @danmosemsft

@danmoseley
Copy link
Member

@jaredpar fyi

@stephentoub
Copy link
Member Author

@jaredpar FYI

Note that this doesn't take any dependencies on what the C# compiler does; I purposefully stayed away from that. So, for example, it's not parsing decorated names or expecting certain compiler-generated fields. Rather, it's only looking at data structures defined in System.Private.CoreLib, which seems like very fair game. This was all made easier by the changes in #13105, which changed how the boxing was done, putting the state machine struct into a strongly-typed field.

(Actually, as I wrote the above, I realize this isn't entirely true: it currently depends on the state machine type being a struct, which already isn't true in C# debug builds. I'll fix that.)

if (type != NULL)
{
delete[] type;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify to delete[] type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jaredpar
Copy link
Member

@stephentoub

Note that this doesn't take any dependencies on what the C# compiler does; I purposefully stayed away from that

I appreciate that but also part of me really really wants to say: please add an option where I can restrict to only tasks that are currently "waiting" which would require that dependency. That's really useful for items like figuring out why a unit test is hanging. Not sure if that's a good idea or not though.

{
// Get the async state machine object's StateMachine field.
CLRDATA_ADDRESS fieldMT;
int iOffset = GetObjFieldOffset(TO_CDADDR(itr->GetAddress()), itr->GetMT(), W("StateMachine"), TRUE, &fieldMT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any non public identifiers that should now get a comment in the implemention warning that the name should not change? Eg 'StateMachine'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make sure there are comments in the corresponding .cs files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also m_continuationObject perhaps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@stephentoub
Copy link
Member Author

part of me really really wants to say: please add an option where I can restrict to only tasks that are currently "waiting" which would require that dependency

Yeah... I had that as an internal debate all weekend while thinking about adding this command. It would require interpreting <>1__state (-2 == done, -1 == running, 0+ at the corresponding numbered await)...

@mikem8361
Copy link
Member

I'm currently moving SOS (along with the lldb plugin and the managed part of SOS) to a new "diagnostics" repo and attempting to make SOS runtime version independent (hopefully even compatible going back to 1.1). Some of this will be documenting and asserting the runtime data structures SOS that haven't changed much over the years.

It sounds like your new command (which is great by the way) is dependent on the latest 2.2 (master branch) runtime. We may consider adding the version dependent part the command as a new interface to the DAC making it easy to determine. At the very least there should be a runtime version check in the command (currently easy on Windows via GetEEVersion(), but I need to do something for xplat).

I'll port any changes from coreclr to the new repo and (with your help) do the work to create the new DAC interface.

@stephentoub
Copy link
Member Author

It sounds like your new command (which is great by the way)

Thanks!

is dependent on the latest 2.2 (master branch)

It should work fine with 2.1, as it doesn't depend on anything that's changed since 2.1. But it won't work earlier than 2.1 (it won't find any state machine objects).

I'll port any changes from coreclr to the new repo

Thanks.

@jkotas
Copy link
Member

jkotas commented May 29, 2018

At the very least there should be a runtime version check in the command

I think it is better to just check for the presence of structures that the command depends on and fail without crashing if they are not present. The implementation does that already.

@mikem8361
Copy link
Member

mikem8361 commented May 29, 2018 via email

GCRoot (gcroot) EEStack (eestack)
PrintException (pe) ClrStack (clrstack)
GCInfo
DumpAsync IP2MD (ip2md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: I usually put the lowercase alias in paras in the help like "DumpAsync (dumpasync)".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return E_FAIL;
}

WCHAR* type = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - you could use ArrayHolder here to make the cleanup automatic instead of having to delete it in both the try and catch blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@noahfalk
Copy link
Member

@kouvel

Thanks for putting this together Stephen! Above you mentioned:

Yeah... I had that as an internal debate all weekend while thinking about adding this command. It would require interpreting <>1__state (-2 == done, -1 == running, 0+ at the corresponding numbered await)...

In your opinion what is the downside of adding this interpretation? (concern about taking a dependency on the compiler naming scheme "<>1__"?) I'm hoping to push the diagnostics in this area further and it sounded like there might be obstacles here that needed to be addressed.

Thanks!

@stephentoub
Copy link
Member Author

what is the downside of adding this interpretation? (concern about taking a dependency on the compiler naming scheme "<>1__"?)

Yup, the downside is just a dependency on the implementation-level detail that the compiler uses for this: how and where the state for the state machine is stored, and how to interpret it. To my knowledge this has been fairly stable (@jaredpar?), it just makes me a bit nervous, as the compiler could easily change this and then the dependency would be broken. If you and Jared don't think it's a problem, though, I'll go ahead and add the filtering, as I agree it's valuable.

@jaredpar
Copy link
Member

@stephentoub

To my knowledge this has been fairly stable (@jaredpar?),

It's been this way since the feature was authored.

If you and Jared don't think it's a problem, though, I'll go ahead and add the filtering, as I agree it's valuable.

Our advice for production tools is generally: never, ever, depend on the shape of our emitted code. We do change it regularly and if we break you we won't really feel bad about it (well I do feel bad, but still going to take the change).

The one exception is debugging tools. In order to function at times they need to understand the shape of our generated code. It's the only way they can sensible unwind closures into the original locals for example.

At the same time though, we don't have much qualm about breaking our debuggers either. But we own them and tend to change them in concert with making the emit change.

@stephentoub
Copy link
Member Author

Hmmm. Ok, thanks. I'll leave it out for now, but we can potentially add it later.

Debugging async methods with sos can be time consuming and relies on knowing how to use dumpheap, dumpvc, gcroot, and other commands to get the desired information, while often digging through a myriad of objects on the heap to find the desired info.

This commit adds a new DumpAsync command, which finds the async state machine objects on the GC heap and outputs relevant information about each, including the fields of its state machine, any registered continuation, and GC roots for the state machine object (as they often serve as a valid substitute for call stacks).

Example program used as a test:
```C#
using System.Threading.Tasks;

class Program
{
    static async Task Main() => await MethodA();
    static async Task MethodA() => await MethodB();
    static async Task MethodB() => await MethodC();
    static async Task MethodC() => await MethodD();
    static async Task MethodD() => await Task.Delay(int.MaxValue);
}
```
and example command output:
```
0:011> !DumpAsync -type MethodD
         Address               MT     Size Name
000001989f413de0 00007ff88c506ba8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]
StateMachine: Program+<MethodD>d__4
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8d3df4b80  400000d        0         System.Int32  1 instance                0 <>1__state
00007ff8d3e082c0  400000e        8 ...TaskMethodBuilder  1 instance 000001989f413e38 <>t__builder
00007ff8d3dfea90  400000f       10 ...vices.TaskAwaiter  1 instance 000001989f413e40 <>u__1
Continuation: 000001989f413e50 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]])
GC roots:
    Thread 2936c:
        000000071a37e050 00007ff8d3ac1657 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [d:\repos\coreclr\src\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 2977]
            rbp+10: 000000071a37e0c0
                ->  000001989f413fa0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
                ->  000001989f413f30 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
                ->  000001989f413ec0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
                ->  000001989f413e50 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]]
                ->  000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

    HandleTable:
        000001989d8415f8 (pinned handle)
        -> 00000198af3e1038 System.Object[]
        -> 000001989f413410 System.Threading.TimerQueue[]
        -> 000001989f413468 System.Threading.TimerQueue
        -> 000001989f413330 System.Threading.TimerQueueTimer
        -> 000001989f412e40 System.Threading.Tasks.Task+DelayPromise
        -> 000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

Found 1 state machines.
```
@stephentoub stephentoub merged commit a9c56e2 into dotnet:master May 30, 2018
@stephentoub stephentoub deleted the dumpasync branch May 30, 2018 02:53
stephentoub added a commit to dotnet/buildtools that referenced this pull request Jun 1, 2018
dotnet/coreclr#18160 and dotnet/coreclr#18213 added a new `dumpasync` command to sos that will dump out data on async methods currently in use.  Including this in the core dump analysis has similar value to dumping out all call stacks, in that it'll give us more details into what was going on in the process.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants