Merge branch 'master' into thread

ssbc · May 9, 2023 · c0a1cd8 · c0a1cd8
2 parents 5718a69 + 81f8ae4
commit c0a1cd8
Show file tree

Hide file tree

Showing 4 changed files with 456 additions and 4 deletions.
diff --git a/009.md b/009.md
@@ -0,0 +1,213 @@
+# Tangles
+
+Authors: Mix Irving <mix@protozoa.nz>, Andre Staltz <contact@staltz.com>
+
+Date: 2023-03-23
+
+License: CC0-1.0
+
+
+## Abstract
+
+In any given set of SSB messages, it is impossible to determine their partial or
+total order based on timestamps alone.  However, including the hashes of
+previous messages in a message allows us to determine the order of messages.  In
+this SIP, we specify a consistent way of declaring hashes of previous messages,
+such that they form a directed acyclic graph (DAG).  This DAG is useful for many
+use cases, including replication, and multi-writer "records".
+
+
+## Motivation
+
+As a permissionless, decentralized, and eventually consistent database, SSB
+cannot rely on timestamps to determine the order of messages.  Instead, when a
+message refers to the hash of another message, we can infer that the message
+that was referred to must have been created first, because we cannot create that
+hash without knowing the original data.  These referred hashes effectively form
+a proof that a message was created after another message.
+
+A "tangle" in SSB is a pattern of declaring previous message hashes, which will
+define a directed acyclic graph (DAG) of messages.  A tangle is a useful way to
+determine a partial ordering which in turn is useful for everything from
+replication to building multi-writer "records".
+
+A topological sort of a tangle renders a linear ordering of messages, which is
+useful for displaying messages in user interfaces.
+
+
+## Terminology
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in RFC 2119.
+
+
+## Specification
+
+There are different types of tangle, but they all MUST specify:
+
+1. **Candidates messages**: the set of messages which _could_ be part of the
+  tangle
+2. **Recipe**: how the tangle (the DAG) is constructed from these candidates
+
+It is RECOMMENDED for every recipe to start with some "root message" and extend
+out from that point, checking the validity of messages as they are added.  Some
+candidate messages may not have connections to the graph, or may be "invalid"
+extensions, in which case they are excluded from the tangle.
+
+A given message MAY belong to zero, one, or many tangles.  Each tangle
+is a separate DAG.  A tangle is identified by a human-readable string.
+
+We define:
+- **the root message** as the earliest message in a tangle. Causally it is the
+  "oldest", and is often used to identify all other candidate messages.
+- **the tip(s)** of the tangle are the message(s) at the leading or "newest" end
+  of the tangle i.e. it can transitively connected to the root, and no other
+  messages yet link back to it.
+
+There MUST only be one root message per tangle.
+
+
+### Tangle fields in messages
+
+Each message `m` in a tangle SHOULD have a `tangles` field in the `content`
+field, i.e. `m.content.tangles`.  If `m.content.tangles` exists, its value MUST
+be an object with one field for each tangle.  The key of each field MUST be a
+human-readable identifier string (lets refer to it here as `x`) for that tangle,
+and the value MUST be an object (called the "tangle data") with the fields:
+
+- `root`: the ID of the root message of the tangle or `null` if `m` is the root
+- `previous`: an Array of message IDs of the known tip(s) of the DAG at the time
+  `m` was published, or `null` if `m` is the root
+
+
+### Classic feed: single-author tangle
+
+The most trivial tangle is the classic SSB feed.  In this case, the tangle for
+feed `A` is defined as:
+
+1. **Candidates messages**:
+   - MUST have `author` field equal to `A`
+2. **Recipe**:
+   1. If the candidate message has `sequence` field equal to `1`, then its field
+    `previous` MUST be `null`
+   2. If the candidate message has `sequence` field equal to `i` where `i` is a
+    number greater than `1`, then its field `previous` MUST be the message ID of
+    the message that has `sequence` field equal to `i - 1`
+
+
+```mermaid
+flowchart RL
+A(A):::root;B(B);C(C);D(D);
+
+subgraph "feed"
+  direction RL
+D-->C-->B-->A
+end
+
+classDef default fill: #e953da, stroke-width: 0, color: white
+classDef root fill: #4c44cf, stroke-width: 0, color: white
+classDef cluster stroke:#e953da,fill:none;
+```
+
+The tangle data these messages carry looks like:
+```javascript
+A => { sequence: 1, previous: null, ... }
+B => { sequence: 2, previous: A, ... }
+C => { sequence: 3, previous: B, ... }
+D => { sequence: 4, previous: C, ... }
+```
+
+(The fields `author`, `signature`, `timestamp`, `hash`, and `content` have been
+ommited here to make the backlinking pattern clearer)
+
+This is a special case of a tangle, for a few reasons:
+
+- This tangle does not have "tangle data" with `root` and `previous`
+- The DAG is linear, with no branches or merges
+
+
+### Multi-author tangle
+
+While the classic feed is a special case of a tangle without branches, in the
+case of multiple authors, we need to support branches and merges.  This is
+because different authors may contribute concurrently, or while offline.
+
+The following is not a description of a specific "tangle data" with candidates
+and recipes, but rather a template for tangle data when there are messages from
+multiple authors.
+
+As a general pattern for multi-author tangle data: when a message is published,
+its `previous` field SHOULD have the IDs of all known **tips** of the tangle,
+and it SHOULD NOT have any other message IDs in its `previous` field.
+
+As an example, suppose there are messages with tangle data that look like:
+
+```javascript
+A => { root: null, previous: null }
+B => { root: A,    previous: [A] }
+X => { root: A,    previous: [B] }
+Y => { root: A,    previous: [B] }
+M => { root: A,    previous: [X, Y] }
+```
+
+Then the directed acyclic graph (DAG) looks like:
+
+```mermaid
+flowchart RL
+A(A):::root;B(B);X(X);Y(Y);M(M);
+
+M-->X-->B-->A
+M-->Y--->B
+
+classDef default fill: #e953da, stroke-width: 0, color: white
+classDef root fill: #4c44cf, stroke-width: 0, color: white
+```
+_Diagram where messages X, Y were both published concurrently (so were unaware
+of one another).  Message M is aware of both X and Y, and extends the tangle
+from them.  M is now the new "tip" of the tangle._
+
+Note that message `M` points to `X` and `Y` as its `previous` messages.  This
+conforms to the general pattern for multi-author tangle data.
+
+Further, the root message MUST NOT include its own ID (not known until
+published), so it sets its `root` value as `null`, which means "I am the root".
+
+The "tangle data" `{ root, previous }` for e.g. a "chess game tangle" SHOULD be
+stored at `msg.value.content.tangles.chess`, note the choice of a human-friendly
+name `chess` for the domain of a chess game.
+
+Some tangles may have _candidates_ that define a strict subset of messages as
+valid, e.g. only messages authored by peers the original author follows.
+
+
+## Considerations
+
+**Privacy Considerations** – if a tangle is encrypted (e.g. to a group) then all
+messages in it should be encrypted similarly, because message backlinks reveal
+the other authors who may have participated in the tangle, which may be a
+privacy risk.  This means that you also want to be careful of revealing
+individual messages in a tangle (a feature box1, box2 support).
+
+**Implementation Considerations** – it's important that when defining a tangle,
+the candidate messages and recipe are very clearly defined.  The recipe may also
+need to include how a tangle should be presented to users (e.g. whether a tangle
+is linearised, and how).
+
+## References
+
+### Informative
+
+- [ssbdrv's document about tangles](https://github.com/cn-uofbasel/ssbdrv/blob/master/doc/tangle.md)
+- [Tangle Origin](./009/tangle_origin.md) Story
+- [tangle-js](https://gitlab.com/tangle-js) - a collection of NodeJS modules
+  used for building tangles data, and reducing data held in messages.
+  Effectively turns tangles into CRDTs
+
+### Implementation
+
+JavaScript:
+- [`ssb-sort`](https://github.com/ssbc/ssb-sort) – used for tangle sorting used
+by many JS clients.
+- [`ssb-tribes/lib/get-group-tangle.js`](https://github.com/ssbc/ssb-tribes/blob/master/lib/get-group-tangle.js) – uses `@tangle` modules to determine tangle tips
+