Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread::unpark: Avoid notifying with mutex locked. #54806

Merged
merged 1 commit into from
Oct 31, 2018
Merged

Conversation

parched
Copy link
Contributor

@parched parched commented Oct 4, 2018

This means when the other thread wakes it can continue right away
instead of having to wait for the mutex.

Also add some comments explaining why the mutex needs to be locked in
the first place.

This is a follow up to #54174
I did some tests with relacy here (This PR is InnerV2). If anyone can think of some other test case worth adding let me know.

r? @RalfJung

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 4, 2018
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:04:54]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[00:05:00] warning: unused `sync::mutex::MutexGuard` which must be used
[00:05:00]     --> libstd/thread/mod.rs:1093:9
[00:05:00]      |
[00:05:00] 1093 |         self.inner.lock.lock().unwrap();
[00:05:00]      |
[00:05:00]      = note: #[warn(unused_must_use)] on by default
[00:05:00]      = note: #[warn(unused_must_use)] on by default
[00:05:00]      = note: if unused the Mutex will immediately unlock
[00:05:07]     Finished release [optimized] target(s) in 44.15s
[00:05:07] Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
[00:05:07] travis_fold:end:stage0-std

---
[00:20:15]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[00:20:22] error: unused `sync::mutex::MutexGuard` which must be used
[00:20:22]     --> libstd/thread/mod.rs:1093:9
[00:20:22]      |
[00:20:22] 1093 |         self.inner.lock.lock().unwrap();
[00:20:22]      |
[00:20:22]      = note: `-D unused-must-use` implied by `-D warnings`
[00:20:22]      = note: `-D unused-must-use` implied by `-D warnings`
[00:20:22]      = note: if unused the Mutex will immediately unlock
[00:20:23] error: aborting due to previous error
[00:20:23] 
[00:20:23] error: Could not compile `std`.
[00:20:23] 
[00:20:23] 
[00:20:23] To learn more, run the command again with --verbose.
[00:20:23] travis_fold:end:stage1-std

[00:20:23] travis_time:end:stage1-std:start=1538639012400258770,finish=1538639070907210955,duration=58506952185

[00:20:23] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:20:23] expected success, got: exit code: 101
[00:20:23] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1112:9
[00:20:23] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap build
[00:20:23] Build completed unsuccessfully in 0:16:00
[00:20:23] Makefile:28: recipe for target 'all' failed
[00:20:23] Makefile:28: recipe for target 'all' failed
[00:20:23] make: *** [all] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:04a05cb0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
21968 ./.git/modules/src/tools/cargo/objects/pack
21140 ./src/llvm-emscripten/test/Transforms
20960 ./.git/modules/src/liblibc
travis_time:end:14851dfa:start=1538639071241900019,finish=1538639071694095525,duration=452195506
travis_fold:end:after_failurative/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:094cc080
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

// up) and when it actually waits on `cvar` which is when it can
// receive `notify_one`. Fortunately it has `lock` locked at this stage
// so we can acquire `lock` to wait until it is ready to receive the
// notification.
Copy link
Member

@RalfJung RalfJung Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I do not find this comment very helpful, but that may be just wording: It is unclear what the "which is" refers to; I first thought it refers to the period ("There is a period ... which is"). So please split this first sentence into at least two sentences.

Also it should say explicitly what the bad behavior we want to avoid is: When we call notify_one, we must be sure the other thread already went to sleep (it's fine if it has already woken up again), if it is still about to go to sleep that's no good as the notification just gets lost.

// Releasing `lock` before the call to `notify_one` means that when the
// parked thread wakes it doesn't get woken only to have to wait for us
// to release `lock`.
self.inner.lock.lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to let-bind it, use e.g. let _ = ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be counterproductive? Since this PR would be literally a NFC then?
But I think this line should be something like drop(self.inner.lock.lock().unwrap());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let _ = ... and let _x = ... are very different. The former drops immediately.

But you are right, for extra clarity it is probably better to call drop explicitly.

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2018

LGTM, but I'd prefer another set of eyes on this. @alexcrichton? @carllerche?

@alexcrichton
Copy link
Member

Functionally this makes sense to me, I agree we don't need to hold the lock over the notification, we just need to acquire the lock to ensure the other thread, if any, is asleep and will receive the notification.

@parched
Copy link
Contributor Author

parched commented Oct 10, 2018

Thanks, I'll make the changes next week when I'm back from holiday.

By the way, what's the policy for squashing fix commits in a PR? I seem to remember reading some time ago rust preferred not to, but now I can't find where that was stated.

@alexcrichton
Copy link
Member

Either way's fine, whichever you prefer!

@TimNN TimNN added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 23, 2018
@TimNN
Copy link
Contributor

TimNN commented Oct 30, 2018

Ping from triage @parched: what is the status of this PR?

This means when the other thread wakes it can continue right away
instead of having to wait for the mutex.

Also add some comments explaining why the mutex needs to be locked in
the first place.
@parched
Copy link
Contributor Author

parched commented Oct 30, 2018

Thanks for the reminder @TimNN. How does it look now @RalfJung?

@RalfJung
Copy link
Member

Seems we got two people agreeing this makes sense, so let's land it. :)

@bors r+

@bors
Copy link
Contributor

bors commented Oct 31, 2018

📌 Commit d3e71e4 has been approved by RalfJung

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 31, 2018
@bors
Copy link
Contributor

bors commented Oct 31, 2018

⌛ Testing commit d3e71e4 with merge c61baf7c0f8e842dc73948ba3b132f43ae61442b...

@bors
Copy link
Contributor

bors commented Oct 31, 2018

💔 Test failed - status-appveyor

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Oct 31, 2018
@RalfJung
Copy link
Member

Timed out after 3h.

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2018
@bors
Copy link
Contributor

bors commented Oct 31, 2018

⌛ Testing commit d3e71e4 with merge de9666f...

bors added a commit that referenced this pull request Oct 31, 2018
thread::unpark: Avoid notifying with mutex locked.

This means when the other thread wakes it can continue right away
instead of having to wait for the mutex.

Also add some comments explaining why the mutex needs to be locked in
the first place.

This is a follow up to #54174
I did some tests with relacy [here](https://gist.github.com/parched/b7fb88c97755a81e5cb9f9048a15f7fb) (This PR is InnerV2). If anyone can think of some other test case worth adding let me know.

r? @RalfJung
@bors
Copy link
Contributor

bors commented Oct 31, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: RalfJung
Pushing de9666f to master...

@bors bors merged commit d3e71e4 into rust-lang:master Oct 31, 2018
@ghost
Copy link

ghost commented Nov 6, 2018

This PR broke Tokio's sanitization tests in: tokio-rs/tokio#722
Failed job: https://travis-ci.org/tokio-rs/tokio/jobs/450952478#L957

I can reproduce the reported data race locally on my machine on nightly-2018-11-01, but can't reproduce on nightly-2018-10-31. The diff between those two releases is:

de9666f123 Auto merge of #54806 - parched:park, r=RalfJung
05812fa8c5 Auto merge of #55304 - alexcrichton:update-credentials, r=kennytm
0db7abe5b6 Auto merge of #54004 - tromey:enum-debuginfo, r=tromey
d3e71e4986 thread::unpark: Avoid notifying with mutex locked.
98b26888e5 Update lldb
d36e37e43e Add legacy debuginfo tests
8bbb62f849 Update enum debuginfo tests
da7b6b4b4d Avoid possible integer overflow in niche value computation
e7c49a738e Add more enum debug info tests
d8b0eb7caf Tighten enum-debug test
c32f402749 Address review comments
71ce4c3007 Fix DWARF generation for enums
3c25f80f85 ci: Move global credentials to web configuration

It can only be due to this PR. The reported data race is between a write in pthread_cond_destroy (called by Condvar::drop()) and a read in pthread_cond_signal (called by Condvar::notify_one() inside Thread::unpark()).

The PR looks correct to me so I don't know what to make of this. It might be some kind of synchronization inside pthread's condvar that thread sanitizer is not understanding. I'll probably just add this case to the TSAN suppressions file.

cc @carllerche

@alexcrichton
Copy link
Member

I think there's been bugs with TSAN in the past around (maybe?) fenches which Arc uses to guarantee thread safety, and this seems specifically related to that sort of destruction where one thread signals a thread and then another thread destroys the last instance of the Arc, which in theory is already properly synchronized (and certainly not touched by this PR!)

@parched parched deleted the park branch November 6, 2018 22:07
@parched
Copy link
Contributor Author

parched commented Nov 6, 2018

I think Alex is correct, it would be nice to get a minimal reproducer if possible to verify though.

@ghost ghost mentioned this pull request Nov 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants