All models are wrong, some are useful, and not having any models is worse than wrong.
Distributed systems like µONOS can be difficult to design and even harder to implement. When systems don't strictly conform to their intended properties, unpredictable behaviors can emerge. Even with good design and coding practices in place, building confidence in a distributed system can require extensive testing and evaluation.
To aid in the identification of concurrency bugs and help build confidence that can service and ensure we adhere to the formal specifications. Manifold Finance also supports a special form of monitoring designed to detect violations of the system's properties. Conformance monitoring in µONOS uses formal modelling techniques to specify and verify system properties in near-real time.
When performance conformance monitoring on a production cluster, users must use [Helm] to deploy a single monitor pod per [TLA+] specification.
The conformance monitor currently requires Kafka to stream traces to TLC.
To deploy Kafka using Helm, first add the incubator repository to your Helm configuration:
$ helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
Define a values.yaml
file for the Kafka cluster, configuring the topics to be used by the conformance monitor
to consume traces and produce alerts:
replicas: 1
zookeeper:
replicaCount: 1
topics:
- name: traces
partitions: 3
replicationFactor: 1
- name: alerts
partitions: 1
replicationFactor: 1
Then, install the incubator/kafka
chart with the desired configuration overrides:
$ helm install kafka incubator/kafka --values values.yaml
The successful completion of the kafka-config
pod indicates the Kafka brokers have completed startup and
configured topics have been created.
Once the Kafka cluster has been configured, this chart can be deployed to perform near real-time conformance monitoring on Kafka streams. The monitor uses [TLA+] specifications to evaluate traces received from Kafka, and invariants specified in the chart configuration can detect safety violations in the trace stream.
Several artifacts are required to by the chart:
model
- the name of the module to evaluatemodules
- an array of TLA+ module files to mount to the monitor podspec
- the specification to evaluateinit
- the state initialization predicate (required ifspec
is not configured)next
- the next state relation (required ifspec
is not configured)
Additional options can be used to specify invariants and other constraints on the model checker:
invariants
- an array of invariants to check for each traceconstants
- a mapping of constant values to assign to the modelconstraints
- an array of state constraintsproperties
- an array of model properties
$ helm install my-monitor --set modules={Cache.tla,CacheHistory.tla} --set model=Cache.tla --set config.spec=Spec --set config.invariants={TypeOK}
The role of the conformance monitor is to track the system state over time and detect violations of safety guarantees by analyzing the system state. To do so, [TLA+] specifications are used to provide a formal model for state transitions and specify the invariants to which the system must conform.
TLA+ specifications consist of one or more modules defined in .tla
files. A module defines a set
of formulas for evaluating the state of the system. The specification provides a formula describing
the initial system state and its transitions:
---------------------------- MODULE MonotonicTrace --------------------------
EXTENDS Naturals, Sequences
\* A list of variables in the spec
vars == <<...>>
\* The init predicate describing the initial state of the system
Init == ...
\* The next state relation describing possible state transitions
Next == ...
\* The system specification describes the initial system state and next state relations
Spec == Init /\ [][Next]_<<vars>>
=============================================================================
Typically, the TLA+ model checker, TLC, computes and evaluates every state that can be reached
by the spec according to its initial state and next state relation. Conformance monitoring specs
operate on an infinite stream of traces, using the NextTrace
operator to consume and check
traces in near-real time:
INSTANCE Traces
\* The previous trace version
VARIABLE prevVersion
\* The current trace version
VARIABLE nextVersion
\* A list of variables in the spec
vars == <<prevVersion, nextVersion>>
\* Read a trace record from the stream and update the previous and next versions
Read ==
LET record == NextTrace
IN
/\ PrintT(record)
/\ prevVersion' = nextVersion
/\ nextVersion' = record.version
\* The init predicate describing the initial state of the system
Init ==
/\ prevVersion = 0
/\ nextVersion = 0
\* The next state relation describing possible state transitions
Next ==
\/ Read
\/ UNCHANGED <<vars>>
The final component of a conformance spec is the invariant(s). Invariants are predicates that describe the properties of a well behaved (conforming) system. After each trace is consumed and processed from the traces stream, TLC will evaluate invariants to determine whether the system's state conforms to its safety properties:
\* An invariant verifying that the trace version is monotonically increasing
TypeOK == nextVersion # 0 => nextVersion < prevVersion
In the event the invariant is violated by the system traces, the PublishAlert
operator
can be used to publish an alert.
INSTANCE Alerts
\* An invariant verifying that the trace version is monotonically increasing
TypeOK ==
\/ nextVersion # 0 => nextVersion < prevVersion
\/ PublishAlert([msg |-> "Invariant violated",
prevVersion |-> prevVersion,
nextVersion |-> nextVersion])
With the initial state predicate, the next state relation, and the type invariants, a complete conformance monitoring spec can be compiled:
---------------------------- MODULE MonotonicTrace --------------------------
EXTENDS Naturals, Sequences
INSTANCE Traces
INSTANCE Alerts
\* The previous trace version
VARIABLE prevVersion
\* The current trace version
VARIABLE nextVersion
\* A list of variables in the spec
vars == <<prevVersion, nextVersion>>
\* An invariant verifying that the trace version is monotonically increasing
TypeOK ==
\/ nextVersion # 0 => nextVersion < prevVersion
\/ PublishAlert([msg |-> "Invariant violated",
prevVersion |-> prevVersion,
nextVersion |-> nextVersion])
\* Read a trace record from the traces stream and update the previous and next versions
Read ==
LET record == NextTrace
IN
/\ PrintT(record)
/\ prevVersion' = nextVersion
/\ nextVersion' = record.version
\* The init predicate describing the initial state of the system
Init ==
/\ prevVersion = 0
/\ nextVersion = 0
\* The next state relation describing possible state transitions
Next ==
\/ Read
\/ UNCHANGED <<vars>>
\* The system specification describes the initial system state and next state relations
Spec == Init /\ [][Next]_<<vars>>
=============================================================================
If Condition is false or null, then:
MTF “passes” validation [by default] for Rule.
If Condition is true [by assessment or by default], then:
If Statement is true, MTF “passes” validation for Rule.
If Statement is null, MTF “passes” validation [by default] for Rule.
If Statement is false, MTF “fails” validation for Rule.
// TODO
https://github.com/Azure/azure-cosmos-tla
https://github.com/Starydark/PaxosStore-tla
https://github.com/elastic/elasticsearch-formal-models
a.k.a blame game
https://buildkite.com/blog/outage-post-mortem-for-august-22nd
GPL-2.0