Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print out both accesses involved in a data race #2205

Closed
cbeuw opened this issue Jun 6, 2022 · 7 comments · Fixed by #2646
Closed

Print out both accesses involved in a data race #2205

cbeuw opened this issue Jun 6, 2022 · 7 comments · Fixed by #2646
Labels
A-data-race Area: data race detector A-diagnostics errors and warnings emitted by miri C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement

Comments

@cbeuw
Copy link
Contributor

cbeuw commented Jun 6, 2022

Currently Miri throws a UB on the access that ultimately causes a data race, which is always the later one. It is not clear where the other access involved in the data race is in the source file.

error: Undefined Behavior: Data race detected between Write on Thread(id = 0, name = "main") and Write on Thread(id = 1) at alloc1 (current vector clock = VClock([5]), conflicting timestamp = VClock([0, 8]))
  --> src/lib.rs:21:18
   |
21 |         unsafe { V = 2 }
   |                  ^^^^^ Data race detected between Write on Thread(id = 0, name = "main") and Write on Thread(id = 1) at alloc1 (current vector clock = VClock([5]), conflicting timestamp = VClock([0, 8]))
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
           
   = note: inside `tests::data_race` at src/lib.rs:21:18

Both TSan and Go's race detector print out the line numbers of both accesses involved in the data race, we'd like to do the same.

It shouldn't be too hard to implement, we just need to augment VTimestamp with the current Span from MiriEvalContext::cur_span()

pub type VTimestamp = u32;

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2022

Stacked Borrows recently got some support for showing extra spans thanks to @saethlin; we probably want to reuse that -- in particular it doesn't use cur_span but tries to find a span "in the local crate".

Also, augmenting every timestamp might be a bit too expensive in terms of performance?

@saethlin
Copy link
Member

saethlin commented Jun 6, 2022

I don't know that augmenting every timestamp would be too slow. TSan manages to report backtraces in a debug build for these races, and those backtraces are not small and we're only trying to report one measly span. If someone is interested in doing this, it's probably worth giving the naïve approach a shot. If it's too slow I already have some ideas.

@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-data-race Area: data race detector A-diagnostics errors and warnings emitted by miri labels Jun 23, 2022
@cbeuw
Copy link
Contributor Author

cbeuw commented Jun 24, 2022

Is there a way to get a backtrace in Miri, and how expensive would this be? Only showing one span is probably not too helpful

@saethlin
Copy link
Member

ecx.generate_stacktrace(), or you may be interested in looking over this:

miri/src/helpers.rs

Lines 825 to 860 in dcaa7a7

impl<'mir, 'tcx> Evaluator<'mir, 'tcx> {
pub fn current_span(&self) -> CurrentSpan<'_, 'mir, 'tcx> {
CurrentSpan { span: None, machine: self }
}
}
/// A `CurrentSpan` should be created infrequently (ideally once) per interpreter step. It does
/// nothing on creation, but when `CurrentSpan::get` is called, searches the current stack for the
/// topmost frame which corresponds to a local crate, and returns the current span in that frame.
/// The result of that search is cached so that later calls are approximately free.
#[derive(Clone)]
pub struct CurrentSpan<'a, 'mir, 'tcx> {
span: Option<Span>,
machine: &'a Evaluator<'mir, 'tcx>,
}
impl<'a, 'mir, 'tcx> CurrentSpan<'a, 'mir, 'tcx> {
pub fn get(&mut self) -> Span {
*self.span.get_or_insert_with(|| Self::current_span(self.machine))
}
#[inline(never)]
fn current_span(machine: &Evaluator<'_, '_>) -> Span {
machine
.threads
.active_thread_stack()
.iter()
.rev()
.find(|frame| {
let def_id = frame.instance.def_id();
def_id.is_local() || machine.local_crates.contains(&def_id.krate)
})
.map(|frame| frame.current_span())
.unwrap_or(rustc_span::DUMMY_SP)
}
}

@cbeuw
Copy link
Contributor Author

cbeuw commented Jun 24, 2022

@saethlin Thanks, that looks handy.

we just need to augment VTimestamp with the current Span from

I realised what I said wouldn't work, because

//! The timestamps used in the data-race detector assign each sequence of non-atomic operations
//! followed by a single atomic or concurrent operation a single timestamp.
//! Write, Read, Write, ThreadJoin will be represented by a single timestamp value on a thread.
//! This is because extra increment operations between the operations in the sequence are not
//! required for accurate reporting of data-race values.

On a data race, we know the offending timestamp, but that could cover multiple spans so we won't be able to pinpoint a line

@ibraheemdev
Copy link
Member

I was trying to debug a data race in https://github.com/ibraheemdev/seize and had it narrowed down to a simple test case with two threads. I wanted to see the order in which miri was interleaving the threads to narrow down the bug, but adding debug statements added extra synchronization that fixed the data race ;) I ended up having to use a thread-unsafe println shim to see the order without breaking it:

pub fn println(x: String) {
    unsafe {
        let mut f = std::fs::File::from_raw_fd(1);
        f.write_all(x.as_bytes()).unwrap();
        std::mem::forget(f);
    }
}

And I did manage to find the missing synchronization edge, but it would be nice if this was easier.

@saethlin
Copy link
Member

but that could cover multiple spans so we won't be able to pinpoint a line

We could give each timestamp a start and end span. I wonder how many times when we report a diagnostic the start and end span would be the same.

@bors bors closed this as completed in 4ec960f Dec 24, 2022
RalfJung pushed a commit to RalfJung/rust that referenced this issue Dec 24, 2022
Data race spans

Fixes rust-lang/miri#2205

This adds output to data race errors very similar to the spans we emit for Stacked Borrows errors. For example, from our test suite:
```
help: The Atomic Load on thread `<unnamed>` is here
  --> tests/fail/data_race/atomic_read_na_write_race1.rs:23:13
   |
23 | ...   (&*c.0).load(Ordering::SeqCst) //~ ERROR: Data race detected between Atomic Load on thread `<unnamed>` and Write o...
   |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: The Write on thread `<unnamed>` is here
  --> tests/fail/data_race/atomic_read_na_write_race1.rs:19:13
   |
19 |             *(c.0 as *mut usize) = 32;
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^```
```

Because of rust-lang/miri#2647 this comes without a perf regression, according to our benchmarks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-race Area: data race detector A-diagnostics errors and warnings emitted by miri C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants