Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize dbghelp with the standard library #230

Closed
alexcrichton opened this issue Jul 31, 2019 · 3 comments · Fixed by #242
Closed

Synchronize dbghelp with the standard library #230

alexcrichton opened this issue Jul 31, 2019 · 3 comments · Fixed by #242

Comments

@alexcrichton
Copy link
Member

The dbghelp library on MSVC is not safe to use concurrently from multiple threads. There is no synchronization between the standard library's usage of RUST_BACKTRACE=1 and this crate generating a backtrace, meaning that it's not technically possible to 100% safely use this crate on MSVC.

Fixing this will likely require some form of global lock exported by the standard library specifically for backtraces and specifically for MSVC (it can be a noop on all other platforms). This library would then, when compiled on crates.io, need to coordinate with the standard library in terms of locking to ensure that only one implementation is using dbghelp at a time.

@arsing
Copy link

arsing commented Aug 13, 2019

This also causes heap corruption crashes currently if both libstd and a user crate take backtraces at approximately the same time. This happens with one of our crates' tests very reliably, where one test happens to run an assert!() (which triggers libstd backtrace collection) at about the same time as another test creates a failure error (which collect its own backtrace).

I happened to spot StackWalkEx in the crashing thread's backtrace, but I assume it would happen even with the older backtrace crate that solely used StackWalk64 ? So just downgrading the backtrace crate would not be useful unless we also used an older Rust that didn't use the backtrace crate?

arsing added a commit to arsing/iotedge that referenced this issue Aug 13, 2019
The original code would create a TcpListener and throw it away, which meant
that multiple calls to `get_unused_port` could end up with the same port.

Also, enable all the edgelet-docker tests that had been disabled on Windows
due to the original issue (even though it was not a Windows-specific problem).

However this means that the number of tests in the edgelet-docker crate running
at the same time is much higher, so it's more likely for the tests to hit
the issue described in rust-lang/backtrace-rs#230 (comment)
Specifically, any test that panics at the same time as another test is
using `failure` to collect a backtrace has a chance of crashing the entire
test binary with STATUS_HEAP_CORRUPTION on Windows, if the `RUST_BACKTRACE`
env var is set.

To avoid this, I've modified the tests that use
`#[should_panic]` + `Result::unwrap` to assert that an operation must fail to
instead use `Result::unwrap_err`.
arsing added a commit to arsing/iotedge that referenced this issue Aug 13, 2019
The original code would create a TcpListener and throw it away, which meant
that multiple calls to `get_unused_port` could end up with the same port.

Also, enable all the edgelet-docker tests that had been disabled on Windows
due to the original issue (even though it was not a Windows-specific problem).

However this means that the number of tests in the edgelet-docker crate running
at the same time is much higher, so it's more likely for the tests to hit
the issue described in rust-lang/backtrace-rs#230 (comment)
Specifically, any test that panics at the same time as another test is
using `failure` to collect a backtrace has a chance of crashing the entire
test binary with STATUS_HEAP_CORRUPTION on Windows, if the `RUST_BACKTRACE`
env var is set.

To avoid this, I've modified the tests that use
`#[should_panic]` + `Result::unwrap` to assert that an operation must fail to
instead use `Result::unwrap_err`.
@alexcrichton
Copy link
Member Author

Yes this unfortunately has always been a bug with the backtrace crate, no version has ever had this bug fixed. This is definitely a case where binaries can segfault because the documentation of Windows explicitly states that it needs synchronization, and we're not doing that! It "just so happens to work" most of the time because it's relatively rare to get two backtraces at the same time.

arsing added a commit to arsing/iotedge that referenced this issue Aug 13, 2019
The original code would create a TcpListener and throw it away, which meant
that multiple calls to `get_unused_port` could end up with the same port.

Also, enable all the edgelet-docker tests that had been disabled on Windows
due to the original issue (even though it was not a Windows-specific problem).

However this means that the number of tests in the edgelet-docker crate running
at the same time is much higher, so it's more likely for the tests to hit
the issue described in rust-lang/backtrace-rs#230 (comment)
Specifically, any test that panics at the same time as another test is
using `failure` to collect a backtrace has a chance of crashing the entire
test binary with STATUS_HEAP_CORRUPTION on Windows, if the `RUST_BACKTRACE`
env var is set.

To avoid this, I've modified the tests that assert that an operation must fail
via `#[should_panic]` + `Result::unwrap` to instead use `Result::unwrap_err`
I've only done this for edgelet-docker since it has a large number of
concurrent tests so it has a very high chance of hitting this, in comparison to
the other crates that use `#[should_panic]`
arsing added a commit to Azure/iotedge that referenced this issue Aug 14, 2019
The original code would create a `TcpListener` and throw it away, which meant
that multiple calls to `get_unused_port` could end up with the same port.

Also, enable all the edgelet-docker tests that had been disabled on Windows
due to the original issue (even though it was not a Windows-specific problem).

However this means that the number of tests in the edgelet-docker crate running
at the same time is much higher, so it's more likely for the tests to hit
the issue described in rust-lang/backtrace-rs#230 (comment)
Specifically, any test that panics at the same time as another test is
using `failure` to collect a backtrace has a chance of crashing the entire
test binary with STATUS_HEAP_CORRUPTION on Windows, if the `RUST_BACKTRACE`
env var is set.

To avoid this, I've modified the tests that assert that an operation must fail
via `#[should_panic]` + `Result::unwrap` to instead use `Result::unwrap_err`
@alexcrichton
Copy link
Member Author

I've posted a possible mitigation for what I'd hope is 90%-ish of cases at #241

alexcrichton added a commit that referenced this issue Aug 16, 2019
On Windows `dbghelp.dll` is required to be used from only one thread at
a time, and while this crate provides that guarantee this crate does not
synchronize with itself in the standard library. This commit solves this
issue by using a Windows-specific trick by creating a named mutex for
synchronization.

When this crate is updated in the standard library it means that the
named mutex here will be shared with the standard library, so the
standard library and this crate should be sharing the same
synchronization primitive and should be able to coordinate calls to
`dbghelp.dll`.

More details are included in the commit itself, but this should...

Closes #230
alexcrichton added a commit that referenced this issue Aug 19, 2019
On Windows `dbghelp.dll` is required to be used from only one thread at
a time, and while this crate provides that guarantee this crate does not
synchronize with itself in the standard library. This commit solves this
issue by using a Windows-specific trick by creating a named mutex for
synchronization.

When this crate is updated in the standard library it means that the
named mutex here will be shared with the standard library, so the
standard library and this crate should be sharing the same
synchronization primitive and should be able to coordinate calls to
`dbghelp.dll`.

More details are included in the commit itself, but this should...

Closes #230
alexcrichton added a commit to alexcrichton/rust that referenced this issue Aug 19, 2019
This commit updates the `backtrace` crate from 0.3.34 to 0.3.35. The
[included set of changes][changes] for this update mostly includes some
gimli-related improvements (not relevant for the standard library) but
critically includes a fix for rust-lang/backtrace-rs#230. The standard
library will not aqcuire a session-local lock whenever a backtrace is
generated on Windows to allow external synchronization with the
`backtrace` crate itself, allowing `backtrace` to be safely used while
other threads may be panicking.

[changes]: rust-lang/backtrace-rs@0.3.34...0.3.35
bors added a commit to rust-lang/rust that referenced this issue Aug 19, 2019
std: Update `backtrace` crate dependency

This commit updates the `backtrace` crate from 0.3.34 to 0.3.35. The
[included set of changes][changes] for this update mostly includes some
gimli-related improvements (not relevant for the standard library) but
critically includes a fix for rust-lang/backtrace-rs#230. The standard
library will not aqcuire a session-local lock whenever a backtrace is
generated on Windows to allow external synchronization with the
`backtrace` crate itself, allowing `backtrace` to be safely used while
other threads may be panicking.

[changes]: rust-lang/backtrace-rs@0.3.34...0.3.35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants