thread::unpark: Avoid notifying with mutex locked. #54806

parched · 2018-10-04T07:22:34Z

This means when the other thread wakes it can continue right away
instead of having to wait for the mutex.

Also add some comments explaining why the mutex needs to be locked in
the first place.

This is a follow up to #54174
I did some tests with relacy here (This PR is InnerV2). If anyone can think of some other test case worth adding let me know.

r? @RalfJung

rust-highfive · 2018-10-04T07:44:37Z

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:04:54]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[00:05:00] warning: unused `sync::mutex::MutexGuard` which must be used
[00:05:00]     --> libstd/thread/mod.rs:1093:9
[00:05:00]      |
[00:05:00] 1093 |         self.inner.lock.lock().unwrap();
[00:05:00]      |
[00:05:00]      = note: #[warn(unused_must_use)] on by default
[00:05:00]      = note: #[warn(unused_must_use)] on by default
[00:05:00]      = note: if unused the Mutex will immediately unlock
[00:05:07]     Finished release [optimized] target(s) in 44.15s
[00:05:07] Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
[00:05:07] travis_fold:end:stage0-std

---
[00:20:15]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[00:20:22] error: unused `sync::mutex::MutexGuard` which must be used
[00:20:22]     --> libstd/thread/mod.rs:1093:9
[00:20:22]      |
[00:20:22] 1093 |         self.inner.lock.lock().unwrap();
[00:20:22]      |
[00:20:22]      = note: `-D unused-must-use` implied by `-D warnings`
[00:20:22]      = note: `-D unused-must-use` implied by `-D warnings`
[00:20:22]      = note: if unused the Mutex will immediately unlock
[00:20:23] error: aborting due to previous error
[00:20:23] 
[00:20:23] error: Could not compile `std`.
[00:20:23] 
[00:20:23] 
[00:20:23] To learn more, run the command again with --verbose.
[00:20:23] travis_fold:end:stage1-std

[00:20:23] travis_time:end:stage1-std:start=1538639012400258770,finish=1538639070907210955,duration=58506952185

[00:20:23] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:20:23] expected success, got: exit code: 101
[00:20:23] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1112:9
[00:20:23] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap build
[00:20:23] Build completed unsuccessfully in 0:16:00
[00:20:23] Makefile:28: recipe for target 'all' failed
[00:20:23] Makefile:28: recipe for target 'all' failed
[00:20:23] make: *** [all] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:04a05cb0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
21968 ./.git/modules/src/tools/cargo/objects/pack
21140 ./src/llvm-emscripten/test/Transforms
20960 ./.git/modules/src/liblibc
travis_time:end:14851dfa:start=1538639071241900019,finish=1538639071694095525,duration=452195506
travis_fold:end:after_failurative/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:094cc080
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

RalfJung · 2018-10-05T18:21:11Z

src/libstd/thread/mod.rs

+        // up) and when it actually waits on `cvar` which is when it can
+        // receive `notify_one`. Fortunately it has `lock` locked at this stage
+        // so we can acquire `lock` to wait until it is ready to receive the
+        // notification.


TBH I do not find this comment very helpful, but that may be just wording: It is unclear what the "which is" refers to; I first thought it refers to the period ("There is a period ... which is"). So please split this first sentence into at least two sentences.

Also it should say explicitly what the bad behavior we want to avoid is: When we call notify_one, we must be sure the other thread already went to sleep (it's fine if it has already woken up again), if it is still about to go to sleep that's no good as the notification just gets lost.

RalfJung · 2018-10-05T18:21:58Z

src/libstd/thread/mod.rs

+        // Releasing `lock` before the call to `notify_one` means that when the
+        // parked thread wakes it doesn't get woken only to have to wait for us
+        // to release `lock`.
+        self.inner.lock.lock().unwrap();


You need to let-bind it, use e.g. let _ = ...

Wouldn't that be counterproductive? Since this PR would be literally a NFC then?
But I think this line should be something like drop(self.inner.lock.lock().unwrap());

let _ = ... and let _x = ... are very different. The former drops immediately.

But you are right, for extra clarity it is probably better to call drop explicitly.

RalfJung · 2018-10-05T18:22:42Z

LGTM, but I'd prefer another set of eyes on this. @alexcrichton? @carllerche?

alexcrichton · 2018-10-05T22:24:06Z

Functionally this makes sense to me, I agree we don't need to hold the lock over the notification, we just need to acquire the lock to ensure the other thread, if any, is asleep and will receive the notification.

parched · 2018-10-10T12:44:45Z

Thanks, I'll make the changes next week when I'm back from holiday.

By the way, what's the policy for squashing fix commits in a PR? I seem to remember reading some time ago rust preferred not to, but now I can't find where that was stated.

alexcrichton · 2018-10-10T16:17:11Z

Either way's fine, whichever you prefer!

TimNN · 2018-10-30T17:33:44Z

Ping from triage @parched: what is the status of this PR?

This means when the other thread wakes it can continue right away instead of having to wait for the mutex. Also add some comments explaining why the mutex needs to be locked in the first place.

parched · 2018-10-30T22:57:30Z

Thanks for the reminder @TimNN. How does it look now @RalfJung?

RalfJung · 2018-10-31T07:38:10Z

Seems we got two people agreeing this makes sense, so let's land it. :)

@bors r+

bors · 2018-10-31T07:38:11Z

📌 Commit d3e71e4 has been approved by RalfJung

bors · 2018-10-31T09:23:12Z

⌛ Testing commit d3e71e4 with merge c61baf7c0f8e842dc73948ba3b132f43ae61442b...

bors · 2018-10-31T12:26:08Z

💔 Test failed - status-appveyor

RalfJung · 2018-10-31T12:44:37Z

Timed out after 3h.

@bors retry

bors · 2018-10-31T13:52:08Z

⌛ Testing commit d3e71e4 with merge de9666f...

@RalfJung

thread::unpark: Avoid notifying with mutex locked. This means when the other thread wakes it can continue right away instead of having to wait for the mutex. Also add some comments explaining why the mutex needs to be locked in the first place. This is a follow up to #54174 I did some tests with relacy [here](https://gist.github.com/parched/b7fb88c97755a81e5cb9f9048a15f7fb) (This PR is InnerV2). If anyone can think of some other test case worth adding let me know. r? @RalfJung

bors · 2018-10-31T16:44:55Z

☀️ Test successful - status-appveyor, status-travis
Approved by: RalfJung
Pushing de9666f to master...

ghost · 2018-11-06T09:50:11Z

This PR broke Tokio's sanitization tests in: tokio-rs/tokio#722
Failed job: https://travis-ci.org/tokio-rs/tokio/jobs/450952478#L957

I can reproduce the reported data race locally on my machine on nightly-2018-11-01, but can't reproduce on nightly-2018-10-31. The diff between those two releases is:

de9666f123 Auto merge of #54806 - parched:park, r=RalfJung
05812fa8c5 Auto merge of #55304 - alexcrichton:update-credentials, r=kennytm
0db7abe5b6 Auto merge of #54004 - tromey:enum-debuginfo, r=tromey
d3e71e4986 thread::unpark: Avoid notifying with mutex locked.
98b26888e5 Update lldb
d36e37e43e Add legacy debuginfo tests
8bbb62f849 Update enum debuginfo tests
da7b6b4b4d Avoid possible integer overflow in niche value computation
e7c49a738e Add more enum debug info tests
d8b0eb7caf Tighten enum-debug test
c32f402749 Address review comments
71ce4c3007 Fix DWARF generation for enums
3c25f80f85 ci: Move global credentials to web configuration

It can only be due to this PR. The reported data race is between a write in pthread_cond_destroy (called by Condvar::drop()) and a read in pthread_cond_signal (called by Condvar::notify_one() inside Thread::unpark()).

The PR looks correct to me so I don't know what to make of this. It might be some kind of synchronization inside pthread's condvar that thread sanitizer is not understanding. I'll probably just add this case to the TSAN suppressions file.

cc @carllerche

alexcrichton · 2018-11-06T15:10:43Z

I think there's been bugs with TSAN in the past around (maybe?) fenches which Arc uses to guarantee thread safety, and this seems specifically related to that sort of destruction where one thread signals a thread and then another thread destroys the last instance of the Arc, which in theory is already properly synchronized (and certainly not touched by this PR!)

parched · 2018-11-06T22:44:29Z

I think Alex is correct, it would be nice to get a minimal reproducer if possible to verify though.

rust-highfive assigned RalfJung Oct 4, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 4, 2018

RalfJung reviewed Oct 5, 2018

View reviewed changes

TimNN added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 23, 2018

thread::unpark: Avoid notifying with mutex locked.

d3e71e4

This means when the other thread wakes it can continue right away instead of having to wait for the mutex. Also add some comments explaining why the mutex needs to be locked in the first place.

parched force-pushed the park branch from d91bdbe to d3e71e4 Compare October 30, 2018 22:55

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 31, 2018

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Oct 31, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2018

bors merged commit d3e71e4 into rust-lang:master Oct 31, 2018

parched deleted the park branch November 6, 2018 22:07

ghost mentioned this pull request Nov 8, 2018

fix tsan errors tokio-rs/tokio#745

Merged

RalfJung mentioned this pull request Oct 25, 2019

Soundness of std::sync::Once #65796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thread::unpark: Avoid notifying with mutex locked. #54806

thread::unpark: Avoid notifying with mutex locked. #54806

parched commented Oct 4, 2018

rust-highfive commented Oct 4, 2018

RalfJung Oct 5, 2018 •

edited

Loading

RalfJung Oct 5, 2018

cynecx Oct 6, 2018

RalfJung Oct 6, 2018

RalfJung commented Oct 5, 2018

alexcrichton commented Oct 5, 2018

parched commented Oct 10, 2018

alexcrichton commented Oct 10, 2018

TimNN commented Oct 30, 2018

parched commented Oct 30, 2018

RalfJung commented Oct 31, 2018

bors commented Oct 31, 2018

bors commented Oct 31, 2018

bors commented Oct 31, 2018

RalfJung commented Oct 31, 2018

bors commented Oct 31, 2018

bors commented Oct 31, 2018

ghost commented Nov 6, 2018

alexcrichton commented Nov 6, 2018

parched commented Nov 6, 2018

thread::unpark: Avoid notifying with mutex locked. #54806

thread::unpark: Avoid notifying with mutex locked. #54806

Conversation

parched commented Oct 4, 2018

rust-highfive commented Oct 4, 2018

RalfJung Oct 5, 2018 • edited Loading

Choose a reason for hiding this comment

RalfJung Oct 5, 2018

Choose a reason for hiding this comment

cynecx Oct 6, 2018

Choose a reason for hiding this comment

RalfJung Oct 6, 2018

Choose a reason for hiding this comment

RalfJung commented Oct 5, 2018

alexcrichton commented Oct 5, 2018

parched commented Oct 10, 2018

alexcrichton commented Oct 10, 2018

TimNN commented Oct 30, 2018

parched commented Oct 30, 2018

RalfJung commented Oct 31, 2018

bors commented Oct 31, 2018

bors commented Oct 31, 2018

bors commented Oct 31, 2018

RalfJung commented Oct 31, 2018

bors commented Oct 31, 2018

bors commented Oct 31, 2018

ghost commented Nov 6, 2018

alexcrichton commented Nov 6, 2018

parched commented Nov 6, 2018

RalfJung Oct 5, 2018 •

edited

Loading