Skip to content

Commit

Permalink
specs: universal share prefix (#856)
Browse files Browse the repository at this point in the history
  • Loading branch information
rootulp and adlerjohn committed Oct 31, 2022
1 parent eb3791e commit 26f11cb
Show file tree
Hide file tree
Showing 12 changed files with 256 additions and 71 deletions.
1 change: 1 addition & 0 deletions specs/src/specs/consensus.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
| `MAX_VALIDATORS` | `uint16` | `64` | | Maximum number of active validators. |
| `NAMESPACE_ID_BYTES` | `uint64` | `8` | `byte` | Size of namespace ID, in bytes. |
| `NAMESPACE_ID_MAX_RESERVED` | `uint64` | `255` | | Value of maximum reserved namespace ID (inclusive). 1 byte worth of IDs. |
| `SHARE_INFO_BYTES` | `uint64` | `1` | `byte` | The number of bytes used for [share](data_structures.md#share) information |
| `SHARE_RESERVED_BYTES` | `uint64` | `1` | `byte` | Bytes reserved at the beginning of each [share](data_structures.md#share). Must be sufficient to represent `SHARE_SIZE`. |
| `SHARE_SIZE` | `uint64` | `256` | `byte` | Size of transaction and message [shares](data_structures.md#share), in bytes. |
| `STATE_SUBTREE_RESERVED_BYTES` | `uint64` | `1` | `byte` | Number of bytes reserved to identify state subtrees. |
Expand Down
46 changes: 36 additions & 10 deletions specs/src/specs/data_structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -482,28 +482,54 @@ Finally, the `availableDataRoot` of the block [Header](#header) is computed as t

A share is a fixed-size data chunk associated with a namespace ID, whose data will be erasure-coded and committed to in [Namespace Merkle trees](#namespace-merkle-tree).

A share's raw data `rawData` is interpreted differently depending on the namespace ID.
A sequence is a contiguous set of shares that contain semantically relevant data. A sequence should be parsed together because data may be split across share boundaries. One sequence exists per reserved namespace and per message.

- The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` is the namespace ID of that share, `namespaceID`.
- The next [`SHARE_INFO_BYTES`](./consensus.md#constants) bytes are for share information with the following structure:
- The first 7 bits represent the share version in big endian form (initially, this will be `0000000` for version `0`);
- The last bit is a sequence start indicator, that is `1` if the share is at the start of a sequence or `0` if it is a continuation share.

The remainder of a share's raw data `rawData` is interpreted differently depending on the namespace ID.

#### Compact Share

For shares **with a reserved namespace ID through [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**:

- The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` is the namespace ID of that share, `namespaceID`.
- The next [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes (the `*` in the example layout figure below) is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a one-byte big-endian unsigned integer (i.e. canonical serialization is not used). In this example, with a share size of `256` the first byte would be `80` (or `0x50` in hex).
- The remaining [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)`-`[`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes are request data.
> **Note** The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` is the namespace ID of that share, `namespaceID`. The next [`SHARE_INFO_BYTES`](./consensus.md#constants) bytes are for share information.
- If this is the first share of a sequence, the next 1 to 10 bytes contain a [varint](https://developers.google.com/protocol-buffers/docs/encoding) of the `uint64` length of the sequence that follows, in bytes.
- The next [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a one-byte big-endian unsigned integer (i.e. canonical serialization is not used). In the example below, with a share size of `256` the reserved byte would be `80` (or `0x50` in hex).
- The remaining [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)`-`[`SHARE_INFO_BYTES`](./consensus.md#constants) `-` 1 to 10 bytes (if this is the first share of a sequence) `-` [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes are transactions, intermediate state roots, or evidence data depending on the namespace of ths share. Each transaction, intermediate state root, or evidence is prefixed with a [varint](https://developers.google.com/protocol-buffers/docs/encoding) of the length of that unit.
- If there is insufficient transaction, intermediate state root, or evidence data to fill the share, the remaining bytes are filled with `0`.

First share in a sequence:
![fig: compact start share.](./figures/compact_start_share.svg)

![fig: Reserved share.](./figures/share.svg)
Continuation share in a sequence:
![fig: compact continuation share.](./figures/compact_continuation_share.svg)

#### Sparse Share

For shares **with a namespace ID above [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants) but below [`PARITY_SHARE_NAMESPACE_ID`](./consensus.md#constants)**:

- The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` is the namespace ID of that share, `namespaceID`.
- If this is the first share of a message, the next 1-10 bytes contain a [varint](https://developers.google.com/protocol-buffers/docs/encoding) of the uint64 length of the message that follows. If this isn't the first share of a message, the next 1-10 bytes are used for request data.
- The remaining bytes are request data. In other words, the remaining bytes have no special meaning and are simply used to store data.
> **Note** The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` is the namespace ID of that share, `namespaceID`. The next [`SHARE_INFO_BYTES`](./consensus.md#constants) bytes are for share information.
- If this is the first share of a sequence, the next 1 to 10 bytes contain a [varint](https://developers.google.com/protocol-buffers/docs/encoding) of the `uint64` length of the sequence that follows.
- The remaining [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)`-`[`SHARE_INFO_BYTES`](./consensus.md#constants) `-` 1 to 10 bytes (if this is the first share of a sequence) bytes are message data. Message data are opaque bytes of data that are included in the block but do not impact the state. In other words, the remaining bytes have no special meaning and are simply used to store data.
- If there is insufficient message data to fill the share, the remaining bytes are filled with `0`.

First share in a sequence:
![fig: sparse start share.](./figures/sparse_start_share.svg)

Continuation share in a sequence:
![fig: sparse continuation share.](./figures/sparse_continuation_share.svg)

#### Parity Share

For shares **with a namespace ID equal to [`PARITY_SHARE_NAMESPACE_ID`](./consensus.md#constants)** (i.e. parity shares):

- Bytes carry no special meaning.

For non-parity shares, if there is insufficient request data to fill the share, the remaining bytes are filled with `0`.

### Arranging Available Data Into Shares

The previous sections described how some original data, arranged into a `k * k` matrix, can be extended into a `2k * 2k` matrix and committed to with NMT roots. This section specifies how [available data](#available-data) (which includes [transactions](#transactiondata), [intermediate state roots](#intermediatestaterootdata), [evidence](#evidencedata), and [messages](#messagedata)) is arranged into the matrix in the first place.
Expand Down
27 changes: 27 additions & 0 deletions specs/src/specs/figures/compact_continuation_share.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
digraph G {
node [shape = record, penwidth = 0];

share [label=<
<table border="0" cellborder="1" cellspacing="0">
<tr>
<td align="left" border="0" cellpadding="0">0</td>
<td align="left" border="0" cellpadding="0">8</td>
<td align="left" border="0" cellpadding="0">9</td>
<td align="left" border="0" cellpadding="0">10</td>
<td align="left" border="0" cellpadding="0">80</td>
<td align="left" border="0" cellpadding="0">82</td>
<td align="left" border="0" cellpadding="0">200</td>
<td align="left" border="0" cellpadding="0">256</td>
</tr>
<tr>
<td width="8" cellpadding="4">namespace id</td>
<td width="1" cellpadding="4">info byte</td>
<td width="1" cellpadding="4">reserved byte</td>
<td width="100" cellpadding="4">end of tx2</td>
<td width="2" cellpadding="4">len(tx3)</td>
<td width="100" cellpadding="4">tx3</td>
<td width="100" cellpadding="4">zero-padding</td>
</tr>
</table>
>];
}
34 changes: 34 additions & 0 deletions specs/src/specs/figures/compact_continuation_share.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
29 changes: 29 additions & 0 deletions specs/src/specs/figures/compact_start_share.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
digraph G {
node [shape = record, penwidth = 0];

share [label=<
<table border="0" cellborder="1" cellspacing="0">
<tr>
<td align="left" border="0" cellpadding="0">0</td>
<td align="left" border="0" cellpadding="0">8</td>
<td align="left" border="0" cellpadding="0">9</td>
<td align="left" border="0" cellpadding="0">12</td>
<td align="left" border="0" cellpadding="0">13</td>
<td align="left" border="0" cellpadding="0">15</td>
<td align="left" border="0" cellpadding="0">200</td>
<td align="left" border="0" cellpadding="0">202</td>
<td align="left" border="0" cellpadding="0">256</td>
</tr>
<tr>
<td width="8" cellpadding="4">namespace id</td>
<td width="1" cellpadding="4">info byte</td>
<td width="4" cellpadding="4">sequence length</td>
<td width="2" cellpadding="4">reserved byte</td>
<td width="2" cellpadding="4">len(tx1)</td>
<td width="100" cellpadding="4">tx1</td>
<td width="2" cellpadding="4">len(tx2)</td>
<td width="100" cellpadding="4">tx2</td>
</tr>
</table>
>];
}
37 changes: 37 additions & 0 deletions specs/src/specs/figures/compact_start_share.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 0 additions & 25 deletions specs/src/specs/figures/share.dot

This file was deleted.

Loading

0 comments on commit 26f11cb

Please sign in to comment.