-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User Story: Enable diagnosing common Async problems #90
Comments
@noahfalk will drive this |
I think the next thing that needs to be done is to define what success looks like. I'm out for December so either I'll make a proposal in January, or someone from our group of cohorts can drive it in my absense : ) |
That's great, thanks.
With the DumpAsync command I added for 3.0, I hope we're really close here. But few other than me have actually played with the command to my knowledge, so it'll be good to get someone else's take on whether there are remaining gaps in the functionality, in how the information is conveyed, etc. The main remaining thing I'd hoped to add but haven't yet is a parallel-stacks-like consolidation, where the command would optionally collapse all of the "stacks" in common, either entirely or in a partial fashion as parallel stacks does. |
For reference here are the issues/work items that went into this command (you can see sample usages). |
Here are the docs for the DumpAsync command currently
|
Possibly relevant to this scenario is the SOS work item command
That Stephen added in dotnet/coreclr#20872. Which may be useful in determining if the threadpool is starved (not enough threads to run items). It is not clear this is the way we would guide people to diagnoses this, however... |
Just so we don't lose it. Stephen wrote up a slide deck and meeting nodes on async. |
updateI can create minidump with the latest createdump utility . Can I use the latest SOS (dotnet/coreclr@b1e2c66) on 2.1 ? I create a dump with latest createdump. And I want to debug the dump with the following script(https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md#debugging-core-dumps-with-lldb): #!/usr/bin/env bash
if [ "$#" -ne 1 ]; then
echo 'USEAGE: debugcore $CORE_FILE_PATH'
exit 0
fi
COREFILE=$1
RUNTIME_PATH='/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/'
#RUNTIME_PATH='/home/supei/workspace/coreclr/bin/Product/Linux.x64.Debug/'
#PATH_TO_LIBSOSPLUGIN='/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/libsosplugin.so'
PATH_TO_LIBSOSPLUGIN='/home/supei/workspace/coreclr/bin/Product/Linux.x64.Debug/libsosplugin.so'
HOST_PATH='/usr/share/dotnet/dotnet '
echo $COREFILE
lldb-3.9 -O "settings set target.exec-search-paths $RUNTIME_PATH" -o "plugin load $PATH_TO_LIBSOSPLUGIN" --core $COREFILE $HOST_PATH I got
|
If you want DumpAsync (2.1 doesn’t have that command), use the new SOS in the diagnostics repo: https://github.com/dotnet/diagnostics.git. The instructions to build it are here: https://github.com/dotnet/diagnostics#building-the-repository. This SOS is runtime version independent and has been tested on most all the distros and platforms we support.
|
@vancem - before my big vacation I tagged myself to write up some goalposts, but it looks like you took care of that for me (thanks!). Are you happy with the goalposts as they exist now or was there more you planned to add?
I've got some, except most of them are on desktop. I'm not sure we've got a wealth of customer dumps that both have the size you were hoping for and they are running .net core 2.1+ (which is where our tools are most likely to work). @davidfowl might have some better examples from his experiments analyzing test apps at scale, or @stephentoub might have examples he has received? From my learnings on the topic thus far, issues that are likely to show up in the dotnet core space:
deadlocks trying to enter a synchronization context are a major issue for desktop, but I'm hoping that since asp.net core eliminated their synchronization context this is substantially less of an issue on core right now. If other libraries make it common again (say UI libraries with an STA thread) then it could easily shoot back to the top. |
I We need to figure out the TaskCompletionSource issue that @vancem mentions here (#90 (comment)) |
I keep bumping up against dotnet/roslyn#22428 as well; it'd be great to have a fix for that as part of this effort. Example: dotnet/roslyn#22428 (comment) |
I was going to file a new issue but I see we still have this one so I'll pile onto it. I think the parallel tasks view in VS has set the bar for features (and I sent the team some updates they can do to make it nicer) and dumpasync should follow suit. I think the first order of business would be porting dump async to a CLRMD command so we can use managed code. Then there are a bunch of improvements that can be made to it. I'll file those as separate issues. |
As a developer for a .Net Core app, I can use SOS to get a list of all async thread stacks, so that I can understand what async operations are in progress and diagnose my app
As a developer for a .Net Core app, I can use Visual Studio to view a list of all async thread stacks in the parallel stacks window, so that I can understand what async operations are in progress and diagnose my app
There needs to be documentation that is easily discovered on docs.microsoft.com that guides you in the diagnosis procedure. Place the URL here for that guidance when it exists. It is expected that you will want guidance for the case you are debugging with Visual Studio, and other guidance when you are not. Note that we expect to have the dotnet analyze tool that will host SOS, and that can be a useful entryway into the tool needed.
We need to identify a set (lets say 5), of what we would suggest are representative dumps that show the typical problems (in particular deadlock, thread-starvation (others?)) at what we would consider large but still reasonable scale (e.g. > 2GB dump size, and with alot of async/threadpool stuff), that we can validate (by hand) that following the guidance identifies the problem.
The text was updated successfully, but these errors were encountered: