Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup dbghelp so that stdlib can print backtraces on panic again #172

Closed
wants to merge 5 commits into from

Conversation

aloucks
Copy link
Contributor

@aloucks aloucks commented May 3, 2019

This PR adds cleanup for DbgHelp so that the standard library can print backtraces on panic again. It now keeps count of the number of internal requests for initialization within a single API call in order to minimize the number of init/cleanup cycles. This helps to mitigate the additional overhead of pairing cleanup and initialization. In addition, dbghelp is now initialized with the SYMOPT_DEFERRED_LOADS to further reduce the cost of initialization.

     Running target\release\deps\benchmarks-e36a49f193fde544.exe

running 6 tests
test new                                 ... bench:   5,853,185 ns/iter (+/- 329,515)
test new_unresolved                      ... bench:   4,646,070 ns/iter (+/- 359,873)
test new_unresolved_and_resolve_separate ... bench:   9,482,975 ns/iter (+/- 589,027)
test trace                               ... bench:   4,915,970 ns/iter (+/- 251,355)
test trace_and_resolve_callback          ... bench:   6,112,295 ns/iter (+/- 317,735)
test trace_and_resolve_separate          ... bench:  51,936,240 ns/iter (+/- 1,344,326)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out

Fixes #165

@aloucks
Copy link
Contributor Author

aloucks commented May 5, 2019

I noticed that there's a kernel32 feature mentioned with RtlCaptureStackBackTrace, but it wasn't actually implemented. I wanted to see if that would perform any better and the results are pretty good.

Creating an unresolved backtrace with RtlCaptureStackBackTrace becomes extremely cheap.

     Running target\release\deps\benchmarks-bdd5a0737334f726.exe

running 6 tests
test new                                 ... bench:   5,931,245 ns/iter (+/- 317,433)
test new_unresolved                      ... bench:       1,818 ns/iter (+/- 16)
test new_unresolved_and_resolve_separate ... bench:   5,969,000 ns/iter (+/- 762,563)
test trace                               ... bench:       1,293 ns/iter (+/- 13)
test trace_and_resolve_callback          ... bench:  46,683,260 ns/iter (+/- 3,262,865)
test trace_and_resolve_separate          ... bench:  49,156,170 ns/iter (+/- 2,676,383)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out

The challenge is that resolving the symbols in the callback becomes just as slow as resolving them separately because it's having to initialize dbghelp per frame again. I've solved that with some thread-local shenanigans to detect when resolving inside of a trace and then delay the cleanup until the end of the trace.

I'd like to add the RtlCaptureStackBackTrace optimizations in a separate pull request though.

     Running target\release\deps\benchmarks-bdd5a0737334f726.exe

running 6 tests
test new                                 ... bench:   5,788,350 ns/iter (+/- 595,957)
test new_unresolved                      ... bench:       1,818 ns/iter (+/- 53)
test new_unresolved_and_resolve_separate ... bench:   5,762,355 ns/iter (+/- 330,638)
test trace                               ... bench:       1,255 ns/iter (+/- 133)
test trace_and_resolve_callback          ... bench:   5,770,805 ns/iter (+/- 741,753)
test trace_and_resolve_separate          ... bench:  49,198,450 ns/iter (+/- 2,929,539)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out

@alexcrichton
Copy link
Member

Thanks for this! If I understand it, the intention is to create a recursive lock here effectively (aka one that can be safely acquire recursively) and only on the first acquisition and last release do we initialize/cleanup symbols? That way it should always be paired on entering/leaving this library but we also don't attempt to reinitialize on each symbol by "locking" at a higher level.

If that's true, could this perhaps be organized a bit differently? Instead of automatically enabling the std feature could this instead expose an unsafe API which uses no synchronization internally for use with the library? When using the Backtrace wrapper struct (which requires std anyway) it can do all the synchronization there, and otherwise the raw underlying APIs are all unsafe right now anyway requiring external synchronization.

As for RtlCaptureStackBackTrace, I'm actually not sure why that was mentioned! It looks like it never actually existed in this repository beyond documentation... In any case doing that in a separate PR sounds like a good idea :)

@aloucks
Copy link
Contributor Author

aloucks commented May 6, 2019

Sure, I can take a look at re-organizing it. I think what you're getting at is that this PR makes no_std environments incompatible with backtraces on windows :) I was thinking that it was probably a rare case to need no_std on windows, but making that assumption is probably wrong now that you mention it.

When using the Backtrace wrapper struct (which requires std anyway) it can do all the synchronization there, and otherwise the raw underlying APIs are all unsafe right now anyway requiring external synchronization.

I think it'll need to be in trace and resolve, but those have #[cfg(feature = "std"] too.
Edit: yes, you're right. It needs to be in the Backtrace wrapper as well.

I was also wondering if we could eliminate the kernel32 feature and just always use RtlCaptureStackBackTrace instead of StackWalk64. I think would simplify things by having only a single way to capture the trace when dbghelp is enabled. The function is available as far back as Windows XP.

I was initially thinking that adding in RtlCaptureStackBackTrace in a separate PR would make more sense (my branch is based on this PR), but it may be easier to do it all at once if I'm refactoring the locking and organization anyway.

@aloucks
Copy link
Contributor Author

aloucks commented May 7, 2019

I think this is getting close now. I went ahead and added in the RtlCaptureStackBackTrace since it was already in my branch. I think the kernel32 feature should be enabled by default, but I've left it unmodified for now.

I'm not sure what's going on with the build. GitHub is reporting a 404 on the a file that exists and wasn't modified. The commit that it's complaining about doesn't exist though which is odd.

Could you re-trigger the build?

@alexcrichton
Copy link
Member

This has inspired me to take a look again at directly including backtrace-the-crate into libstd so there's only one Rust implementation of backtraces for all platforms. In doing so I realized (and remembered) that the backtraces here are pretty different from the ones in libstd, notably StackWalkEx isn't use here but is used in libstd because it can return inlined functions as well. I'm not sure if RtlCaptureStackBackTrace does, but I'd like to resolve that discrepancy soon.

I'd prefer that documentation where found to either choose RtlCaptureStackBackTrace or StackWalk64, I don't think it makes sense to support both.

I"ll try to find time later today, but I don't currently have time to finish review of this, but wanted to jot these thoughts down!

@alexcrichton
Copy link
Member

Ok I've extracted the cleanup bits to #175 which covers review as well, @aloucks mind taking a look at that and confirming that it looks ok?

@aloucks aloucks closed this May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capturing a backtrace prevents any future panic from printing a backtrace on Windows
2 participants