Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--blame-crash-dump-type is running procdump.exe with wrong arguments #2971

Closed
KirillOsenkov opened this issue Jun 30, 2021 · 11 comments
Closed

Comments

@KirillOsenkov
Copy link
Member

I've spent an inordinate amount of time fighting my CI to produce dumps when the test host process crashes.

I've noticed that the command-line passed to procdump.exe seems wrong:

Running ProcDump with arguments: '-accepteula -e 1 -g -ma -f STACK_OVERFLOW -f ACCESS_VIOLATION 1116 testhost.net472.x86_1116_20210627T000633_crashdump.dmp'.

What I think this does is filters the exception types to only STACK_OVERFLOW and ACCESS_VIOLATION. Why? We want to capture a dump when any unhandled exception happens.

-e 1 - it says "include 1 to create dump on first chance exceptions" - I don't think 1 is needed? Test runs by definition will have tons of first-chance exceptions.

-g - why pass g?

-f STACK_OVERFLOW -f ACCESS_VIOLATION - why pass this?

@nohwnd
Copy link
Member

nohwnd commented Jun 30, 2021

This is because we want to capture stack overflow and access violation exceptions, especially from native code (like cpp test runner which by itself is running in managed code, but calls into unmanaged and that code often crashes the whole testhost process). Those failures are the main source of disastrous crashes that end up having no information about the crash. The run just suddenly stops. Most other exceptions are either captured inside of the test (e.g. test failure, setup failure), or recorded into the log (e.g. null ref in our framework code).

-e 1 -g gives procdump chance to handle stack overflow before it kills the process. If we only observe Unhandled (-e) or non-native (without -g) the process terminates, before procdump has a chance to dump it.
-f STACK_OVERFLOW -f ACCESS_VIOLATION prevents it from terminating on any exception.

I see that in some cases it would be useful to crash dump on other exceptions. Unfortunately I don't see how to do that. If you know the magic sauce for procdump to be able to capture stack overflow, as well as any unhandled exception that is not in a test then that would be awesome.

@KirillOsenkov
Copy link
Member Author

I did some experimenting and am confirming that you do need -e to enable exception monitor. Also confirming that without -e 1 it won't catch the stack overflow.

However if instead of -e 1 you specify -e -t (Exception monitor and Terminate monitor` then it seems to be working as desired. It captures the dump when an unhandled exception crashes the process, but doesn't capture first-chance exceptions (we don't want these).

Confirmed you do need -g for native debugging.

Here's what I ended up with:
procdump -e -t -g -ma 2132 1.dmp

This will capture all unhandled exceptions (not limited to StackOverflow and AccessViolation).

@KirillOsenkov
Copy link
Member Author

Hmm -t seems to be useless because it will dump even when the process terminates normally.

@KirillOsenkov
Copy link
Member Author

I think we may need to use -i to install procdump as the AeDebug postmortem debugger.

This is similar to the following .reg file (that I have locally). This file is amazing because it automatically captures the last 16 crashed processes. The problem with this is that it's machine-wide (and you may need to be admin to write to the registry).

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps]
"DumpFolder"="C:\\CrashDumps"
"DumpCount"=dword:00000010
"DumpType"=dword:00000002

@KirillOsenkov
Copy link
Member Author

The more research I'm doing the more it looks like procdump may not support exactly what we want. I'm disappointed.

Will continue doing some research. May need to ping Mark Russinovich for a definitive answer.

There's also DebugDiag, as well as the LocalDumps registry key. Disappointed there doesn't seem to be an easy answer for this.

@KirillOsenkov
Copy link
Member Author

Jared says the most reliable way the Roslyn team has found is the LocalDumps registry key:

https://github.com/dotnet/roslyn/blob/315c2e149ba7889b0937d872274c33fcbfe9af5f/src/Tools/Source/RunTests/ProcDumpUtil.cs#L53

@KirillOsenkov
Copy link
Member Author

Also asked here on Twitter:
https://twitter.com/KirillOsenkov/status/1410435555578441732

@KirillOsenkov
Copy link
Member Author

Looks like procdump.exe indeed doesn't fully support what we need:
#2972 (comment)

@nohwnd
Copy link
Member

nohwnd commented Jul 19, 2021

Good to know.

@nohwnd
Copy link
Member

nohwnd commented Sep 10, 2021

We are running procdump with -e 1 -g -f and -t which is optimized for caching critical errors (AccessViolationException and StackOverflowException) and Environment.FailFast. Those errors (especially on .NET Framework) result in the testhost just disappearing and are hard to diagnose otherwise. In other cases where the exception is managed, we almost every time are able to collect it in our diag log.

Unfortunately there is no mode for procdump to attach as both managed and native. So we can't observe both types of exceptions, and we are stuck filtering those basic cases.

There are also few internal env variables now that allow you to provide your own parameters, or add additional to the ones we produce. This should give enough flexibility for switching up the approaches when you diagnose an issue and can't get a reasonable crash dump.

#3028 (comment)

@KirillOsenkov
Copy link
Member Author

Yes I think this can be closed now because procdump doesn’t fundamentally support what we want (dump on all crashes but not first-chance exceptions and not normal termination). We can reopen if one day procdump gets fixed to support this.

@nohwnd nohwnd closed this as completed Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants