-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: Avoid locks during TLS destruction on Windows #41512
Conversation
r? @BurntSushi (rust_highfive has picked a reviewer for you, use r? to override) |
cc @brson |
cc @vadimcn |
@retep998 Is there a common solution for this problem on Windows? |
c5537b7
to
fffe254
Compare
Well, the standard Windows solution is "Just don't call any Windows APIs (with a few exceptions) while the loader lock is taken". Which means that if one would like to run arbitrary code on library load/unload, they must provide some kind of Initialize/Shutdown calls in said library. |
In this case, though, this isn't about running destructors on shutdown, but rather an unfortunate interaction of Widnows XP fallback code in the implementation of |
@vadimcn note that this PR fixes the TLS issue by just removing all locks. I tweaked the interface a bit so the locks weren't needed, and the public facing API hasn't changed at all. My main question would be that this is a pretty unfortunate drawback that affects all TLS destructors in Rust, so standard destructors that, say, grab a lock or otherwise go into the Windows runtime may be affected? That seems... bad :( |
Yeah, I don't know of a 100% safe way of running arbitrary code on DLL unload |
cc @brson (you may be able to review as well) |
c::TlsSetValue(key, ptr::null_mut()); | ||
dtor(ptr as *mut _); | ||
c::TlsSetValue((*cur).key, ptr::null_mut()); | ||
((*cur).dtor)(ptr as *mut _); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents this section of code from racing? That is, can more than one thread be here trying to run the dtor of the same key? (This is likely for my own edification, since I can see the old code did this as well. One of my assumptions is that run_dtors
is running every time a thread exits, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct yeah this can run concurrently, but the dtor
is just a raw function pointer so it's up to the dtor to have internal synchronization if neessary.
The list of global dtors is only prepended to so taking any location and iterating over it should always be valid in terms of avoiding use-after-free.
This LGTM and the approach seems sensible. I had one question about the code and one more:
Could you unpack this a bit? In particular, I don't know enough to understand how the second sentence follows from the first. |
Sure yeah, so apparently what's happening is that when a TLS destructor is run the Windows runtime holds at least one lock, the dynamic library lookup lock. I'm not sure if other locks are held. This means that if a TLS dtor attempts to grab lock A, whereas elsewhere in the code you grab lock A and then do something that grabs the dynamic library lookup lock you'll deadlock (as seen here). I'm basically just not sure there's a way to run TLS dtors on Windows in a situation where "almost arbitrary code is allowed" in the same sense that arbitrary code is allowed in the |
Seems sensible, thanks for explaining! @bors r+ |
📌 Commit fffe254 has been approved by |
⌛ Testing commit fffe254 with merge 91ddbd3... |
💔 Test failed - status-appveyor |
Tidy failed? This is odd, since I don't see any error message... I'm going to retry since it feels spurious but not opening an issue yet since, well, I'm not sure: @bors retry
|
…ck, r=BurntSushi std: Avoid locks during TLS destruction on Windows Gecko recently had a bug reported [1] with a deadlock in the Rust TLS implementation for Windows. TLS destructors are implemented in a sort of ad-hoc fashion on Windows as it doesn't natively support destructors for TLS keys. To work around this the runtime manages a list of TLS destructors and registers a hook to get run whenever a thread exits. When a thread exits it takes a look at the list and runs all destructors. Unfortunately it turns out that there's a lock which is held when our "at thread exit" callback is run. The callback then attempts to acquire a lock protecting the list of TLS destructors. Elsewhere in the codebase while we hold a lock over the TLS destructors we try to acquire the same lock held first before our special callback is run. And as a result, deadlock! This commit sidesteps the issue with a few small refactorings: * Removed support for destroying a TLS key on Windows. We don't actually ever exercise this as a public-facing API, and it's only used during `lazy_init` during racy situations. To handle that we just synchronize `lazy_init` globally on Windows so we never have to call `destroy`. * With no need to support removal the global synchronized `Vec` was tranformed to a lock-free linked list. With the removal of locks this means that iteration no long requires a lock and as such we won't run into the deadlock problem mentioned above. Note that it's still a general problem that you have to be extra super careful in TLS destructors. For example no code which runs a TLS destructor on Windows can call back into the Windows API to do a dynamic library lookup. Unfortunately I don't know of a great way around that, but this at least fixes the immediate problem that Gecko was seeing which is that with "well behaved" destructors the system would still deadlock! [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1358151
⌛ Testing commit fffe254 with merge 090ec35... |
💔 Test failed - status-appveyor |
Hmm, no, not a spurious failure, most likely. Maybe it's something to do with thread local storage on Windows? No idea if tidy uses thread locals, though. |
Yeah definitely looks like a segfault related to this, I'll take a look |
fffe254
to
bdf6194
Compare
@bors: r=BurntSushi |
📌 Commit bdf6194 has been approved by |
⌛ Testing commit bdf6194 with merge 51175c9... |
💔 Test failed - status-appveyor |
Failure seems legit:
|
Gecko recently had a bug reported [1] with a deadlock in the Rust TLS implementation for Windows. TLS destructors are implemented in a sort of ad-hoc fashion on Windows as it doesn't natively support destructors for TLS keys. To work around this the runtime manages a list of TLS destructors and registers a hook to get run whenever a thread exits. When a thread exits it takes a look at the list and runs all destructors. Unfortunately it turns out that there's a lock which is held when our "at thread exit" callback is run. The callback then attempts to acquire a lock protecting the list of TLS destructors. Elsewhere in the codebase while we hold a lock over the TLS destructors we try to acquire the same lock held first before our special callback is run. And as a result, deadlock! This commit sidesteps the issue with a few small refactorings: * Removed support for destroying a TLS key on Windows. We don't actually ever exercise this as a public-facing API, and it's only used during `lazy_init` during racy situations. To handle that we just synchronize `lazy_init` globally on Windows so we never have to call `destroy`. * With no need to support removal the global synchronized `Vec` was tranformed to a lock-free linked list. With the removal of locks this means that iteration no long requires a lock and as such we won't run into the deadlock problem mentioned above. Note that it's still a general problem that you have to be extra super careful in TLS destructors. For example no code which runs a TLS destructor on Windows can call back into the Windows API to do a dynamic library lookup. Unfortunately I don't know of a great way around that, but this at least fixes the immediate problem that Gecko was seeing which is that with "well behaved" destructors the system would still deadlock! [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1358151
bdf6194
to
495c998
Compare
@bors: r=BurntSushi |
📌 Commit 495c998 has been approved by |
…ck, r=BurntSushi std: Avoid locks during TLS destruction on Windows Gecko recently had a bug reported [1] with a deadlock in the Rust TLS implementation for Windows. TLS destructors are implemented in a sort of ad-hoc fashion on Windows as it doesn't natively support destructors for TLS keys. To work around this the runtime manages a list of TLS destructors and registers a hook to get run whenever a thread exits. When a thread exits it takes a look at the list and runs all destructors. Unfortunately it turns out that there's a lock which is held when our "at thread exit" callback is run. The callback then attempts to acquire a lock protecting the list of TLS destructors. Elsewhere in the codebase while we hold a lock over the TLS destructors we try to acquire the same lock held first before our special callback is run. And as a result, deadlock! This commit sidesteps the issue with a few small refactorings: * Removed support for destroying a TLS key on Windows. We don't actually ever exercise this as a public-facing API, and it's only used during `lazy_init` during racy situations. To handle that we just synchronize `lazy_init` globally on Windows so we never have to call `destroy`. * With no need to support removal the global synchronized `Vec` was tranformed to a lock-free linked list. With the removal of locks this means that iteration no long requires a lock and as such we won't run into the deadlock problem mentioned above. Note that it's still a general problem that you have to be extra super careful in TLS destructors. For example no code which runs a TLS destructor on Windows can call back into the Windows API to do a dynamic library lookup. Unfortunately I don't know of a great way around that, but this at least fixes the immediate problem that Gecko was seeing which is that with "well behaved" destructors the system would still deadlock! [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1358151
Gecko recently had a bug reported 1 with a deadlock in the Rust TLS
implementation for Windows. TLS destructors are implemented in a sort of ad-hoc
fashion on Windows as it doesn't natively support destructors for TLS keys. To
work around this the runtime manages a list of TLS destructors and registers a
hook to get run whenever a thread exits. When a thread exits it takes a look at
the list and runs all destructors.
Unfortunately it turns out that there's a lock which is held when our "at thread
exit" callback is run. The callback then attempts to acquire a lock protecting
the list of TLS destructors. Elsewhere in the codebase while we hold a lock over
the TLS destructors we try to acquire the same lock held first before our
special callback is run. And as a result, deadlock!
This commit sidesteps the issue with a few small refactorings:
Removed support for destroying a TLS key on Windows. We don't actually ever
exercise this as a public-facing API, and it's only used during
lazy_init
during racy situations. To handle that we just synchronize
lazy_init
globally on Windows so we never have to call
destroy
.With no need to support removal the global synchronized
Vec
was tranformedto a lock-free linked list. With the removal of locks this means that
iteration no long requires a lock and as such we won't run into the deadlock
problem mentioned above.
Note that it's still a general problem that you have to be extra super careful
in TLS destructors. For example no code which runs a TLS destructor on Windows
can call back into the Windows API to do a dynamic library lookup. Unfortunately
I don't know of a great way around that, but this at least fixes the immediate
problem that Gecko was seeing which is that with "well behaved" destructors the
system would still deadlock!