Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump at least partial stack trace for stack overflow #825

Closed
janvorli opened this issue Oct 18, 2018 · 16 comments
Closed

Dump at least partial stack trace for stack overflow #825

janvorli opened this issue Oct 18, 2018 · 16 comments
Assignees
Milestone

Comments

@janvorli
Copy link
Member

When stack overflow happens, it is sometimes hard to determine the real cause, since at stack overflow, the stack trace is not printed.
Since we've recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump at least partial stack trace.

@janvorli janvorli self-assigned this Oct 18, 2018
@markusschaber
Copy link

I currently have a problem where I cannot even get Stack Trace with Visual Studio debugger... So anything which could help us to get a clue would be welcome... :-)

@janvorli
Copy link
Member Author

@markusschaber - are you talking about debugging Windows process or Unix one?

@markusschaber
Copy link

markusschaber commented Nov 20, 2018

@janvorli I had the problem on Windows 10, using rather recent VS 2017 (Version 15.8.x or 15.9.0), running dotnet.exe (2.1, x64) on the same machine. As we were using p/invoke (but not in the code path involved), I had native debugging enabled. I was not able to get a stack trace of the stack involved, neither when running the service in the debugger, nor when attaching to it later or via "just in time" debugging.

@danmoseley
Copy link
Member

danmoseley commented Oct 11, 2019

@janvorli @tommcdon do you think this is possible in the 5.0 timeframe? Arguably this falls under "cloud native" since it requires you set up and attach a debugger to get an idea of the problem, which is not always easy if it only reproes in production. A callstack could be the clue you need to avoid that.

Pretty much every SO I have seen repeats after a few frames (if not 1) so even a few frames would give you almost all the full stack.

@GSPP
Copy link

GSPP commented Oct 11, 2019

@danmosemsft that is a good point. And without any stack you are pretty much lost. Stack overflows are tough to debug.

@tommcdon
Copy link
Member

@janvorli this seems like a reasonable request for 5.0. We have other issues on the debugger where SOS PE on the StackOverflowException object and SOS ClrStack commeands do not work either. dotnet/diagnostics#135 dotnet/diagnostics#66
@sdmaclea @mikem8361 @sergiy-k

@yyjdelete
Copy link

Besides the top N stack, the bottom N stack maybe also needed to make an diag(to see what is called before the stack loop) and reproduce.

@janvorli
Copy link
Member Author

I have made a working prototype that dumps the whole stack trace some time ago already, but I was waiting for the fix for dotnet/coreclr#21061 to finalize it, as the prototype was working only for cases when the stack overflow happened in methods without stack probing.
I was thinking it might be nice to try to "compress" the stack trace in some way to make it shorter and easier to read. E.g. in the case when a single frame repeats n times we could print the frame just once with the repeat count or we might even identify a repeated sequence of the same frames. But these were just quick ideas and I haven't thought much about the feasibility or complexity of doing that.

@danmoseley
Copy link
Member

If it's feasible to include the start of that stack as well (sounds like it, which is great) then I think compressing it as you describe is inevitable. The output would be huge and unwieldy otherwise.

Incidentally what would be the experience in VS? There is not always a console attached to dump it to - would it show up in the debugger output window or the stack window?

@janvorli
Copy link
Member Author

@danmosemsft It would dump to whatever output the fail fast output goes. I am not sure how that behaves in VS.

@mikem8361
Copy link
Member

Will what you are doing for this issue fix the DAC problems in dotnet/diagnostics#66? The stack unwinding code doesn't like that we are on the alternate stack. This is way SOS, etc. doesn't work that well for stack overflow.

@janvorli
Copy link
Member Author

@mikem8361 it would not fix that problem. I think that a solution to that one is to disable the asserts. I had to do it in couple of places in my prototype too. We cannot extend the stack in the original continuous VA range. Although we could possibly add the alternate stack ranges to the Thread and check SP against both the alternate and the regular stack ranges, it seems like an overkill just to make an assert happy.

@danmoseley danmoseley transferred this issue from dotnet/coreclr Dec 13, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 13, 2019
@jeffschwMSFT jeffschwMSFT added this to the 5.0 milestone Jan 8, 2020
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 8, 2020
@janvorli
Copy link
Member Author

Fixed by #31956

@safern
Copy link
Member

safern commented Feb 14, 2020

@janvorli were you able to put the fix back again after the revert, or should this issue be re-opened?

@janvorli
Copy link
Member Author

@safern yes, I've fixed it and merged the version with the fix in #32167

@safern
Copy link
Member

safern commented Feb 14, 2020

Thanks @janvorli -- just making sure since I reverted it 😄

@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests