Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source sets #18

Merged
merged 2 commits into from
Jul 2, 2023
Merged

Source sets #18

merged 2 commits into from
Jul 2, 2023

Conversation

bartoszmodelski
Copy link
Contributor

@bartoszmodelski bartoszmodelski commented Mar 22, 2023

(Based on #19, #21)

PR replacing our current persistent sets with source sets as outlined in the Source Sets: A Foundation for Optimal Dynamic Partial Order Reuction. Source sets achieve a better reduction in general and optimal reduction on some tests.

Part of the improvement can be definitely attributed to the fact that spawns are not treated as independent operations and are not reordered with atomic operations (unless forced by required reorder of atomic operations).

Results

Test 1

(write A, write B), (write B, write A)

  Atomic.spawn (fun () ->
      Atomic.set a 1;
      Atomic.set b 2);
  Atomic.spawn (fun () ->
      Atomic.set b 3;
      Atomic.set a 4);
  Atomic.final (fun () ->
      Printf.printf "a=%d, b=%d\n%!" (Atomic.get a) (Atomic.get b))

main

a=4, b=3
a=4, b=3
a=4, b=2
a=4, b=3
a=4, b=2
a=4, b=2
a=1, b=2
num_traces: 7

Source-set

a=4, b=3
a=4, b=2
a=1, b=2
num_traces: 3

Test 2

(write A, write B), (write A, write B)

main:

a=3, b=4                            
a=3, b=4
a=3, b=4
a=3, b=2
a=3, b=4
a=3, b=4
a=3, b=2
a=1, b=4
a=1, b=2
num_traces: 9

Source-set

➜  dscheck git:(source-set) ✗ dune build && ./_build/default/tests/test_simple.exe -v
a=3, b=4                            
a=3, b=2
a=1, b=4
a=1, b=2

num_traces: 4

Existing tests

  • Safe list goes down from 130k traces to 23k traces. In this test all accesses target single location, so "by def" there's little DPOR algorithm can do with dependency relation specified as currently. I suspect the gain stems from being more careful with independent operations (i.e. spawns).
  • The most significant result is for MSQueue. For n=4 we drop from 12k states to 16. There's a lot of accesses on disjoint locations, which is precisely the scenario DPOR was developed for.

Logic

Just like with the previous algorithm, we execute the program on a single core and take control at calls to Atomic in order to force certain interleavings to be explored. We do that by stopping after every operation, looking at the most recent races and scheduling a trace that reverses the two racing accesses. Races here are pretty much accesses to shared variables made by two different threads.

We depart from the previous algorithm in the way sequences are scheduled. Suppose we have the following interleaving E.t.w.t', where t and t' are a race and w is an independent operation (say spawn of another thread).

  • Previously, we'd try to schedule the process of t' directly after E, to explore E.t'. .., and if proc(t') is not enabled, schedule all processes at E. Now, we're far more precise in that we're looking at all possible sequences that lead to the reversal (e.g. E.t'. .., E.w.t'. .., etc.). In fact, if E.w has already been explored, we know we don't have to schedule anything.
  • The sleep set is far more powerful, letting us ignore reversals of independent operations before they are scheduled.

Above may sound quite abstract. There is a lot of comments in the code that explain step-by-step how we achieve both of these things. Having said that, I would recommend at least skimming the sections 2-5 in the paper, even if just to double check me.

Other implementations

The paper focuses on the Concuerror project, which I didn't find particularly helpful as it focuses on Erlang's CSP. However, the same research group produced https://github.com/nidhugg/nidhugg, which is really close to what we want.

Testing

All existing tests pass and explore the same traces as DPOR. I've also used program_gen a lot, both to compare with existing dpor for smaller programs and with random exploration for larger ones (that source dpor can terminate on, while existing dpor not quite).

Future work

There's a few possible next steps:

  • Improve the dependency relation. At the moment we assume that all accesses on the same variable are races, but this is really not the case: two reads commute, two failing CASes commute, two writes with the same value commute, etc.
  • The paper with source sets also introduces wake-up trees. They are supposed to bring us to the optimal number of sequences (i.e. one sequence per trace). Apparently, they are not quite optimal but it's still an improvement. I'm a bit skeptical though as they use exponential memory and that could have impact on lockfree's testsuite.
  • TruST, which claims to solve all our problems: be linear in time, space and easily parallelizable. It's a fairly recent paper but with a public implementation.

@art-w
Copy link

art-w commented Mar 23, 2023

Very cool!! This is really subtle to review, I like the strategy but I don't fully understand the code, there's some weird bug lurking somewhere:

module Atomic = Dscheck.TracedAtomic

let test () =
  let a = Atomic.make 0 in
  let b = Atomic.make 0 in
  let c = Atomic.make 0 in
  let ok = Atomic.make false in
  Atomic.spawn (fun () ->
    Atomic.set a 1 ;
    Atomic.set b 1 ;
  );
  Atomic.spawn (fun () ->
    Atomic.set c 1 ;
    Atomic.set c 2 ;
    Atomic.set c 3 ;
    Atomic.set b 2 ;
  );
  Atomic.spawn (fun () ->
    if Atomic.get c = 2 && Atomic.get b = 0
    then Atomic.set ok true
  );
  Atomic.final (fun () ->
    Format.printf "a=%i b=%i c=%i ok=%b@."
      (Atomic.get a)
      (Atomic.get b)
      (Atomic.get c)
      (Atomic.get ok)
  )

let () = Atomic.trace test

(*
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=1 c=3 ok=false

num_traces: 6
*)

There should be a window for ok=true?.. (dscheck main says yes, so I think this is new)

Do you think you could test it with #15? There are some existing bugs in dscheck so it's hard to tell if an optimization is correct or if it introduces new issues :/

@bartoszmodelski
Copy link
Contributor Author

Yeah, this is not looking correct. I managed to simplify the failing case to the following:

let test () =
  let c = Atomic.make 0 in
  let b = Atomic.make 0 in
  let ok = Atomic.make false in

  Atomic.spawn (fun () -> Atomic.set b 1);
  Atomic.spawn (fun () ->
      Atomic.set c 1;
      Atomic.set b 2);
  Atomic.spawn (fun () ->
      if Atomic.get c = 0 then if Atomic.get b = 0 then Atomic.set ok true);

  Atomic.final (fun () ->
      Format.printf "b=%i c=%i ok=%b@." (Atomic.get b) (Atomic.get c)
        (Atomic.get ok))

In the 3/4 trace we observe P2's accessing b but it does not bubble down to precede P0's.

@bartoszmodelski bartoszmodelski force-pushed the source-set branch 2 times, most recently from e4abc64 to 337fc79 Compare April 5, 2023 17:17
@bartoszmodelski
Copy link
Contributor Author

Ok, I have some idea about what's happening.

The general principle behind DPOR algos is discovery of reversible races. Whenever we discover one, we need to ensure that a new trace containing the reversal gets explored at some point. Source-DPOR generates "initials" of execution histories leading to reversal and if none of the initials is in the relevant backtrack, one (arbitrarily chosen) initial gets added. That should lead to the new trace. However, it is not what I'm seeing in the case above. There, the fourth (last) trace generates two initials: 0, 2. But 0 is in the sleep set already and inserting it into backtrack does not have any impact. On the other hand, choosing 2 lets us explore all the traces (while still visiting only 7% of the execution histories that vanilla DPOR sees). Afaict we generate the initials and sleep sets correctly, thus I've added a bias to prefer to backtrack unblocked/unexplored transitions and that seems to help.

I'll do more automated testing.

@kayceesrk
Copy link

This looks very promising. Does this algorithm explore all the "interesting" schedules that the original algorithm explores?

@bartoszmodelski
Copy link
Contributor Author

bartoszmodelski commented Apr 7, 2023

As far as the tests we have, yes.

I'm a little uneasy with the fact that we had to specialize on the algo from the paper to handle the case above correctly. I'm working on tooling to automatically search for programs (borrowing ideas from #14), where source-dpor explores less traces than the current implementation.

(Incidentally, it confirms that the 12k execution sequences we explore for michael scott indeed fold into just 16 traces).

Related: #19

IntSet.(min_elt_opt (diff initials prefix_top_sleep))
with
| Some initial -> initial
| None -> IntSet.min_elt initials
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so I don't understand how there can be multiple initials? When dscheck wants to commute the current operation with a previous one, it has to find the (unique) causal chain that allow the current operation to happen earlier (and then the unique initial is the first process of that chain).

I'm guessing the issues you are seeing (and the bias fix) is related to spawns and the chosen initial not being in s.enabled in the past, as scheduling an operation earlier may require the initial to be a parent domain (which will then spawn the current domain and enable the current operation to happen before the backtracked one)

The causal chain is a bit more tricky to determine than just following parent spawns, as the current operation can conditionally depend on some unrelated domain updating an atomic at the right time.

(... the wip code is a bit hard to read, I may be mistaken, but the initials set looks weird :P )

Copy link
Contributor Author

@bartoszmodelski bartoszmodelski Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so I don't understand how there can be multiple initials?

There's many ways to reverse a race and we'd want to explore fewer alternatives where possible. Say, we are in a sequence E.t.w.t', where E, w are some sequences of transitions, and t, t' are single transitions, which constitute a reversible race (act on the same variable, belong to different proc). The strongest way to execute the reversal of t, t' is to add t' to E.backtrack. This guarantees that E.t' will be or has been explored.

But that's not the only way. Suppose w is a single operation that is independent with all of our E, t, t' business. A spawn or just an access of a different var on a different thread. With that, we can explore the reversal as E.w.t' as well. Therefore, there are two initials = {t', w}.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha yes, you are right, I forgot that I was thinking of the shortest (strongest) sequence which is unique. So in your example, I don't think we need to consider the initial w?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to consider both initials to explore fewer interleavings. That's because the sequences E.w.t' and E.t'.w are equivalent and, generally, if we explore one initial, then we have explored both. I haven't penetrated the proof of it fully but it makes sense intuitively - throwing an indep operation in front of a racy program should not impact the result.

We're not seeing this above, without the bias patch, because our computation of indep and initials is too simplistic (essentially more or less forgetting about transitivity of happens-before). We end up with too many initials, and some of them belong to traces, rather than the one we want to schedule.

A curious observation: with intransitive happens-before the without-bias version is fine on even very complicated two-thread tests, but it's really easy to break it with 3+. That's the smallest offending test case I found:

let test () =
  let a = Atomic.make 0 in
  let b = Atomic.make 0 in

  Atomic.spawn (fun () ->
      Atomic.set b 1;
      Atomic.set a 2);

  Atomic.spawn (fun () -> Atomic.set a 4);

  Atomic.spawn (fun () -> Atomic.set b 8);

I have a patch that fixes the computation of indep and initials. I'll clean it up and push next week.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so with two domains, one just needs to mark the backtrackable steps as there's only one alternative domain to choose from (so anything works ^^). With more domains, you can have a complex situation where backtracking one domain operation requires running (or not running) other domains first. The source set paper you linked has an example at the top of page 7, with 4 domains, to illustrate why wake-up trees are necessary for optimality.

In any case, sleeping sets are already pretty good, but:

  • If the strongest initial is in the sleeping set, then one shouldn't attempt to backtrack there as it was already explored
  • If the chosen initial is not in the enabled set at that time, then backtracking will silently ignore the branch

Pending your next fix, without the bias, I think you are in the second situation where the strongest initial is not correctly computed for some backtrackable steps (another domain should have been picked as initial, which would have enabled the branch). With the bias, I think you end up scheduling more branches since it doesn't actually check that the operation was backtrackable at that time (but as long as the enabled set is not considered, then you can still pick a random initial that won't be able to run).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so with two domains, one just needs to mark the backtrackable steps as there's only one alternative domain to choose from (so anything works ^^).

Yep, basically there's no transitive case really. Something to be mindful of in the future.

@bartoszmodelski bartoszmodelski changed the base branch from main to program-gen April 18, 2023 13:16
@bartoszmodelski bartoszmodelski changed the base branch from program-gen to main April 18, 2023 13:16
@bartoszmodelski bartoszmodelski marked this pull request as ready for review April 18, 2023 13:18
@bartoszmodelski bartoszmodelski requested a review from a team April 18, 2023 14:09
@bartoszmodelski
Copy link
Contributor Author

bartoszmodelski commented Apr 18, 2023

Backed out the previous attempt and fixed the root cause. We're now fully in line with the paper and should be good for proper review.

| Some run_ptr -> dependent_vars := IntSet.add run_ptr !dependent_vars);

if happen_after then None else Some state_cell)
sequence
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to distinguish between set and get operations and their effect on run_ptr, otherwise you could end up filtering out steps that could have been backtracked (and hence skip a branch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need to; as far as dpor or source dpor are concerned, every access is basically r/w. Are you referring to computation initials (or indep)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, that's a future problem!

Copy link

@art-w art-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

@kayceesrk
Copy link

@bartoszmodelski what needs to happen to merge this?

@avsm
Copy link

avsm commented May 17, 2023

As a test, it might be worth re-checking this on cells.ml from Eio.Condition (ocaml-multicore/eio#397) as the previous versions OOMed on it.

@bartoszmodelski
Copy link
Contributor Author

@bartoszmodelski what needs to happen to merge this?

This is based on two PRs that await review. One adds a dpor-validator, another test generator. Afaik @lyrm has it as a priority.

As a test, it might be worth re-checking this on cells.ml from Eio.Condition (ocaml-multicore/eio#397) as the previous versions OOMed on it.

I ran tests with the eio suite. To terminate these tests in reasonable time we need #22 as well (it's a draft but a fully tested one, I planned to prep it for review once this is merged but will do so soon).

@kayceesrk
Copy link

kayceesrk commented May 18, 2023

This is based on two PRs that await review. One adds a dpor-validator, another test generator. Afaik @lyrm has it as a priority.

Thanks. What are these two PRs? This PRs message says

(Based on #18, #19)

But this PR is #18.

@bartoszmodelski
Copy link
Contributor Author

Oops, it's #19, #21. Amended the readme.

@kayceesrk
Copy link

Afaik @lyrm has it as a priority.

Hi @lyrm, wonder if you've had a chance to review this?

@lyrm
Copy link
Collaborator

lyrm commented May 30, 2023

I reviewed PR #19 and #21. I have meeting with @bartoszmodelski to talk about this PR and the future of dscheck today and tomorrow.

For this PR, I already had an informal look at it. I tried it with lockfree tests (including some that are not merged). It is working as expected : same results than before if bugs are introduced, way shorter than before if the tested code is working. The hash table tests from example were not finished after running for 36 hours and are now finishing in a few minutes.

I will do a short review of the code this afternoon but I won't have the time to do a deep review of the algorithm. However what PR #19 and #21 bring is here to give us some certainty about that. I will be mainly looking for missing documentation.

@kayceesrk
Copy link

Thanks @lyrm for the update.

May I please encourage updates in the Github thread even if the conversations are happening f2f or through other channels. Such updates help the community to engage with the project better. Thanks for the work on this so far.

Copy link
Collaborator

@lyrm lyrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the algorithm is still quite shallow but explore_dpor seems to follow properly the algorithm of the paper. Comments are definitely useful.

Here is a few general notes / questions :

  • you should mention somewhere that your variables are named according to the paper (with a link to it ?)
  • why is it useful to keep the previous algorithm ? (explore function and 'Dpor option for the trace function)

if happen_after then None else Some state_cell)
sequence

let rec explore_source func state sleep_sets =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is sleep_sets a list ? I see where new sleep_set are added to it but I can't see where anything else then the last element of the list is used. In algorithm 1 of the paper, it is not a list.

I just tried with just a single sleep_set instead of a list and it does not seem to break anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does not have to be a list. Keeping just the one that's passed down works. Having said that:

  • It's useful for introspection; if anything looks odd I'd usually begin with printing backtrack and sleep set at every level.
  • Granular dependency relation branch ports a small optimization from nidhugg that uses this.

let rec explore_source func state sleep_sets =
let sleep = ref (last_element sleep_sets) in
let s = last_element state in
let p_maybe = IntSet.min_elt_opt (IntSet.diff s.enabled !sleep) in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an arbitrary choice ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Technically could be a random choice. Same when selecting the initial.

match proc'.run_ptr with
| None -> false
| Some run_ptr -> obj_ptr = run_ptr && proc'.run_proc <> p)
new_state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be state instead of new_state as you know state_top is going to be filtered out.

sequences from E.
*)
let initials =
let rec f = function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(loop or aux are better then f as, at least, you can make a search on it)

@bartoszmodelski
Copy link
Contributor Author

  • you should mention somewhere that your variables are named according to the paper (with a link to it ?)

Good idea, added a note at the top.

  • why is it useful to keep the previous algorithm ? (explore function and 'Dpor option for the trace function)

I left it in for reference and to make the review easier (removing it here increases size of the patch and opens the doors for changing other parts of the code). Still at this point in development I found it useful to sometimes switch over to the old DPOR and confirm that we are indeed doing the correct thing. I would not oppose removing it in a bit though.

@kayceesrk
Copy link

Ping. The conversation seems to have gotten stuck on this PR. Can we get this to a merge soon?

@lyrm
Copy link
Collaborator

lyrm commented Jun 27, 2023

I think the conversion is over. I can merge this today.

@bartoszmodelski bartoszmodelski merged commit 40073b7 into main Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants