-
Notifications
You must be signed in to change notification settings - Fork 68
Memory model
Note: This page has been superseded by the PLDI 2018 paper Bounding Data Races in Space and Time.
This is the (draft!) memory model for multicore OCaml, which answers
the question of what you get when you read a ref
. In single-threaded
code, the answer is "whatever you most recently wrote", but in
multi-threaded code with complex synchronisation exactly what
"most recently" means becomes murky.
The first part of this document explains how to avoid the subtleties, and write reasonably straightforward synchronisation code in which "most recent" is not a complicated concept. The later parts delve into some of the details, but these are useful mainly for people writing very low-level synchronisation libraries or hacking on the compiler (if you're relying on such low-level details when writing application code, you're doing it wrong).
For all of the details, you should read and play with the formal specification of the memory ordering guarantees, specified using herd. The formal spec is more precise and much, much shorter than this document, but if you're not familiar with axiomatic memory models it may take longer to understand.
We can reason about the interaction between a running OCaml program
and its memory by considering the trace of the program, a sequence of
events, such as reading a value from or writing a value to a ref
(This document uses ref
as a shorthand for "mutable location": the
same rules apply to mutable record fields and mutable object members
as to ref
cells). Each event corresponds to a read or a write
operation in the source code, but a particular operation in the source
code generates a new event every time it runs.
The order that a single-threaded program generates events is called program order. For multi-threaded programs, we still talk about program order, but it is no longer a total order: events generated by two different threads are neither before nor after each other in program order.
Program order describes the order operations were written in the source code, but doesn't necessarily specify the order in which events actually happen: events can be reordered, combined or optimised away both by the compiler and by the CPU. The orderings that can be relied upon are specified as a relation called happens-before: if one event happens before a second, then no thread will ever observe the second but not the first. We say that e₁ happens after e₂ if e₂ happens before e₁, and that they happen concurrently if neither happens before the other. Happens before is transitive (if e₁ happens before e₂ and e₂ happens before e₃, then e₁ happens before e₃) and irreflexive (no event happens before itself).
The API for synchronisation constructs will generally specify
happens-before relationships. For instance, a mutex specifies that one
thread's Lock
operation happens before the corresponding Unlock
,
and that the Unlock
happens before the next thread's Lock
. Below,
the happens-before relationships are specified for refs
and atomic
references (Atomic.t
).
This model doesn't talk about high-level synchronisation constructs (channels, etc.) yet, mainly because we haven't implemented many of them. However, the guarantees made about atomics are sufficient to implement high-level synchronisation on top, and this model will be updated with details of high-level synchronisation features.
The most basic happens-before guarantee is that made about a single
thread's accesses to a particular ref
:
If e₁ and e₂ are events in program order (that is, performed in order by the same thread), they access the same
ref
, and they're not both reads, then e₁ happens before e₂.
Suppose we have the following program:
1. let r = ref 0 in
2. let a = !r in
3. r := 1
4. let b = !r in
5. let c = !r in
...
The above means that lines 1, 2, 3 happen in order (for memory model purposes, the initialisation of the reference on line 1 counts as a "write"). Line 3 happens before lines 4 and 5, but lines 4 and 5 happen concurrently since they are both reads.
When a read event r occurs, it reads the value written by some write w to the same location (again, initialisation counts as a "write" here). Which write that is can usually be determined by the following rule:
If a write w happens before a read r on the same
ref
, and every other write w' either happens before w or after r, then r reads the value written by w.
This rule does not always apply. It may be the case that there is no such w: for instance, there may be a write that happens concurrently with r, or there may be two writes that happen before r but happen concurrently with each other. When this rule does not apply to a read event, we say that there is a data race.
Programs should avoid data races by using synchronisation mechanisms,
such as the atomics described below. Unlike e.g. the C++ memory model,
this memory model does give relatively strong guarantees even in the
presence of data races. However, programs should still strive to avoid
races: the guarantees provided by this model (see "Coherence and data
races" below) serve to limit the damage done by a data race, rather
than to make data races a reasonable way of writing programs (for
instance, in the OCaml memory model a data race on a particular ref
will only cause future reads of that ref
to produce odd results,
rather than making the entire program have undefined behaviour as per
C++).
The rule above is sufficient to ensure the expected behaviour for references used only by one thread. In the example program given previously, the read on line 2 will read the value written by the initialisation on line 1, since the only other write (line 3) happens after line 2. Similarly, the reads on lines 4 and 5 read the value written on line 3, since the only other write (line 1) happens before line 3.
While the above rules ensure that ref
cells have sensible behaviour
when accessed by a single thread, they do not work well for
synchronising multiple threads. For instance, consider this
message-passing idiom, where message
is an int ref
initialised to
0
and flag
is a bool ref
initialised to false
:
thread 1:
1. message := 42
2. flag := true
thread 2:
3. let seen = !flag in
4. let value = !message in
5. if seen then print_int value;
This program contains data races. The read of flag
on line 3 is racy,
since it happens concurrently with the write on line 2, and the read
of message
on line 4 similarly races with the write on line 1. This
program might well print 0
, if the read on line 3 reads from the
write on line 2, and the read on line 4 reads from the initialisation
of message
.
Concretely, this can happen for a number of reasons. For instance,
suppose thread 2 accesses message
beforehand:
thread 2:
let old = !message in
let seen = !flag in
let value = !message in
if seen then print_int value
The compiler might optimise this by removing the second access to
message
:
thread 2:
let old = !message in
let seen = !flag in
let value = old in
if seen then print_int value
Now, this program can have seen = true
yet value = 0
, even if
executed on a sequentially consistent machine.
Even if executed without any optimisations, this program can print 0
if run on a weakly ordered machine such as an ARM or PowerPC. Those
machines will happily reorder the reads from or the writes to flag
and message
on their own.
In the memory model, such bad behaviour (printing 0
) is permitted
because there are no happens-before relations between the operations
in thread 1 and thread 2. To provide the needed synchronisation,
atomics should be used. We rewrite the program, making flag
an
Atomic.t
:
thread 1:
1. message := 42
2. Atomic.set flag true
thread 2:
3. let seen = Atomic.get flag in
4. let value = !message in
5. if seen then print_int value;
This program will never print 0
: it may not print anything, but if
it prints it will print 42
. This is due to the two rules about
atomics, one specifying happens-before on a single thread:
Everything program-order-before an atomic event happens before it, and everything program-order-after an atomic event happens after it.
and one specifying happens-before between threads:
For any two events on the same atomic reference which are not both reads, one of them happens before the other.
By the first rule, the event on line 1 happens before line 2, and
line 3 happens before line 4. By the second, one of lines 2 and 3
happens before the other. If 2 happens before 3, then 1 happens before
4 and the program prints 42. Otherwise, 3 reads false
from the
initialisation of flag
, and the program does not print anything.
Note that since the events on a particular atomic reference are totally ordered by happens-before, data races cannot happen on atomics.
This model specifies behaviour in the presence of data races, a feature shared by Java's memory model but not by that of C++.
The atomic references of this model are analogous to the volatile
variables of Java or seq_cst
atomics in C++, but OCaml atomics have stronger properties regarding reordering with nonatomic variables. For instance, consider this program:
thread 1:
Atomic.set x 1
let a = !y
thread 2:
y := 1
let b = Atomic.get x
In OCaml, one of the atomic accesses to x
happens before the other. If the set
happens first, then b = 1
. If the get
happens first, then the assignment to y
happens before the get
, before the set
, before the read of y
, so a = 1
. However, in both Java and C++ a = b = 0
is a possible outcome.
This (incomplete!) section is useful mainly for reasoning about behaviour in the presence of data races, and verifying that compiler optimisations are sound. Applications (other than low-level synchronisation libraries) should avoid depending on any of this.
The formal memory model is specified by defining the happens-before relation as above, and asserting two conditions: Causality and Coherence
The Causality axiom specifies:
acyclic(hb | po | rf)
where hb
is the happens-before relation, po
is program order, and
rf
is the reads-from relation (where w rf r
if r
reads the value
written by w
). This axiom forbids all load-buffering behaviours,
which removes the "values out of thin air" problem facing C++ and
Java.
Demanding that there be no cycles in hb | po | rf
has been proposed
as a solution to the thin-air problem in C++, but rejected on
performance grounds since it requires on weakly-ordered machines
(Power, ARM) that subsequent stores be made dependent on prior loads,
which requires inserting many useless branches. I believe this is an
acceptable trade-off in OCaml in order to ensure causality, because:
-
Most loads in OCaml are of immutable locations, for which an additional branch need not be inserted.
-
Multicore OCaml's GC design already has a branching read-barrier, so the desired branches are generally already present.
The Coherence axiom specifies:
acyclic(co | rf | fr | (hb & loc))
where co
is the coherence order, which is a total order on
writes to any given location, fr
is the from-reads relation
(defined as rf^(-1); co
) and loc
relates events to the same location.
This means that even if there are data races on a non-atomic location, one of the racing writes will win the race and the value of the location will eventually stabilise.