Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-007: Universal Share Prefix #660

Merged
merged 24 commits into from
Oct 6, 2022
Merged
Changes from 5 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
ad050e2
Draft universal share encoding ADR
rootulp Aug 24, 2022
fb338a1
address @evan-forbes feedback
rootulp Aug 25, 2022
bde5a90
Prefer reserved / unreserved namespace terminology
rootulp Aug 25, 2022
5b42edf
Merge branch 'main' into rp/adr-universal-share-encoding
rootulp Aug 26, 2022
3ddf2dd
Open question: continuation share indicator
rootulp Aug 26, 2022
2a9b792
Update docs/architecture/ADR-002-universal-share-encoding.md
rootulp Aug 29, 2022
40d195b
clarify: share -> message
rootulp Aug 29, 2022
ae2f06b
Merge branch 'main' into rp/adr-universal-share-encoding
rootulp Aug 30, 2022
c9a2fd5
docs: use compact vs sparse
rootulp Aug 31, 2022
b4b44d0
fix typo, message length -> data length
rootulp Aug 31, 2022
82191cf
Add example
rootulp Sep 2, 2022
af0d256
Update docs/architecture/ADR-002-universal-share-encoding.md
rootulp Sep 7, 2022
8cc6476
prefer namespace_id over nid
rootulp Sep 16, 2022
a0c7359
update impl details
rootulp Sep 16, 2022
2eb9e35
clarify existing schema
rootulp Sep 16, 2022
41fbf05
Clarify proposal
rootulp Sep 16, 2022
b6872a8
Merge branch 'main' into rp/adr-universal-share-encoding
rootulp Sep 26, 2022
21e7fe1
chore: rename encoding to prefix, fix adr number
rootulp Sep 26, 2022
2f26c38
rename ADR number, introduce share sequence, add new validity rules
rootulp Oct 3, 2022
07c0a9f
Update docs/architecture/adr-007-universal-share-prefix.md
rootulp Oct 6, 2022
c76c0fc
Update docs/architecture/adr-007-universal-share-prefix.md
rootulp Oct 6, 2022
17a0d05
Update docs/architecture/adr-007-universal-share-prefix.md
rootulp Oct 6, 2022
e680e8d
Update docs/architecture/adr-007-universal-share-prefix.md
rootulp Oct 6, 2022
8aa8a5c
docs: Current Sparse Share Schema
rootulp Oct 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions docs/architecture/ADR-002-universal-share-encoding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# ADR 009: Universal Share Encoding
rootulp marked this conversation as resolved.
Show resolved Hide resolved
<!-- disable markdownlint MD010 because Go code snippet uses tabs -->
<!-- markdownlint-disable MD010 -->

## Changelog

- 2022/9/22: inital draft of InfoReservedByte
- 2022/9/24: update draft to Universal Share Encoding

## Context

- **nid** (8 bytes): namespace id
- **reserved** (1 byte): is the location of the first transaction, ISR, or evidence in this share if there is one and `0` if there isn't one
- **message length** (varint 1 to 10 bytes): is the length of the entire message in bytes
rootulp marked this conversation as resolved.
Show resolved Hide resolved

The current reserved namespace (transaction, ISRs, evidence) share format is:<br>`nid (8 bytes) | reserved (1 byte) | share data`

The current unreserved namespace (message) share format is:

- First share of message:<br>`nid (8 bytes) | message length (varint 1 to 10 bytes) | share data`
- Contiguous share in message:<br>`nid (8 bytes) | share data`

The current share format poses multiple challenges:

1. Clients must have two share parsing implementations (one for reserved namespace shares and one for unreserved namespace shares).
rootulp marked this conversation as resolved.
Show resolved Hide resolved
1. It is difficult to make changes to the share format in a backwards compatible way because clients can't determine which version of the share format an individual share conforms to.
1. It is not possible for a client that samples a random share to determine if the share is the first share of that namespace or a contiguous share.
rootulp marked this conversation as resolved.
Show resolved Hide resolved

## Proposal

Introduce a universal share encoding that applies to both reserved and unreserved share formats:

- First share of namespace (for reserved namespaces) or message (for unreserved namespaces):<br>`nid (8 bytes) | info (1 byte) | message length (varint 1 to 10 bytes) | data`
- Contiguous shares in namespace:<br>`nid (8 bytes) | info (1 byte) | data`
rootulp marked this conversation as resolved.
Show resolved Hide resolved

The reserved share format has the added constraint: the first byte of `data` is a reserved byte so the format is:<br>`nid (8 bytes) | info (1 byte) | message length (varint 1 to 10 bytes) | reserved (1 byte) | data`

Where **info** (1 byte) is a byte with the following structure:

- the first 7 bits are reserved for the version information in big endian form (initially, this will be `0000000` for version 0);
- the last bit is a *message start indicator*, that is `1` if the share is at the start of a namespace or `0` if it is a contiguous share within a namespace.

Rationale:

1. The first 9 bytes of a share are formatted in a consistent way regardless of the type of share (reserved or unreserved namespace). Clients can therefore parse shares into data via one mechanism rather than two.
1. The message start indicator allows clients to parse a whole share in the middle of a namespace, without needing to read the whole namespace.
rootulp marked this conversation as resolved.
Show resolved Hide resolved
1. The version bits allow us to upgrade the share format in the future, if we need to do so in a way that different share formats can be mixed within a block.

## Questions

1. Does the info byte introduce any new attack vectors?
1. Should one bit in the info byte be used to signify that a continuation share is expected after this share?
- This **continuation share indicator** is inspired by [protocol buffer varints](https://developers.google.com/protocol-buffers/docs/encoding#varints) and [UTF-8](https://en.wikipedia.org/wiki/UTF-8).
- The **continuation share indicator** is distinct from the **message start indicator**. Consider a message with 3 contiguous shares:

indicator | share 1 | share 2 | share 3
--- | --- | --- | ---
message start | `1` | `0` | `0`
continuation share | `1` | `1` | `0` <- client stops requesting contiguous shares when they encounter `0`

- This would enable clients to begin parsing a message by sampling a share in the middle of a namespace and proceed to parsing contiguous shares until the end without ever encountering the first share of the message which contains the message length. However, this use case seems contrived because a subset of the message shares may not be meaningful to the client.
- Without the continuation share indicator, the client would have to request the first share of the next message to learn that they had completed requesting the previous message.

1. What happens if a block producer publishes a message with a version that isn't in the list of supported versions?
- It seems like this could be a `ProcessProposal` validity check. Validators already compute the shares in `ProcessProposal` [here](https://github.com/rootulp/celestia-app/blob/ad050e28678119adae02536db3ef5ce083ea1436/app/process_proposal.go#L104-L110) so we can add a check to verify that every share has a valid version.
1. What happens if a block producer publishes a message where the message start indicator isn't set correctly?
- Add a check similar to the one above.

## Alternative Approaches

We briefly considered adding the info byte to only unreserved namespace shares, see <https://github.com/celestiaorg/celestia-app/pull/651>. This approach was a miscommunication or earlier proposal and was deprecated in favor of this ADR.

## Decision

// TODO

## Implementation Details

A share version is not set by a user who submits a `PayForData`. Instead, it is set by the block producer when data is split into shares.

### Constants

1. Define a new constant for `InfoReservedBytes = 1`.
1. Update [`MsgShareSize`](https://github.com/celestiaorg/celestia-core/blob/v0.34.x-celestia/pkg/consts/consts.go#L26) to account for one less byte available
1. Update [`TxShareSize`](https://github.com/celestiaorg/celestia-core/blob/v0.34.x-celestia/pkg/consts/consts.go#L24) to account for one less byte available

**NOTE**: Currently constants are defined in celestia-core's [consts.go](https://github.com/celestiaorg/celestia-core/blob/master/pkg/consts/consts.go) but some will be moved to celestia-app's [appconsts.go](https://github.com/celestiaorg/celestia-app/tree/evan/non-interactive-defaults-feature/pkg/appconsts). See [celestia-core#841](https://github.com/celestiaorg/celestia-core/issues/841).

### Types

1. Introduce a new type `InfoReservedByte` to encapsulate the logic around getting the `Version()` or `IsMessageStart()` from a share.

```golang
// InfoReservedByte is a byte with the following structure: the first 7 bits are
// reserved for version information in big endian form (initially `0000000`).
// The last bit is a "message start indicator", that is `1` if the share is at
// the start of a message and `0` otherwise.
type InfoReservedByte byte

func NewInfoReservedByte(version uint8, isMessageStart bool) (InfoReservedByte, error) {
if version > 127 {
return 0, fmt.Errorf("version %d must be less than or equal to 127", version)
}

prefix := version << 1
if isMessageStart {
return InfoReservedByte(prefix + 1), nil
}
return InfoReservedByte(prefix), nil
}

// Version returns the version encoded in this InfoReservedByte.
// Version is expected to be between 0 and 127 (inclusive).
func (i InfoReservedByte) Version() uint8 {
version := uint8(i) >> 1
return version
}

// IsMessageStart returns whether this share is the start of a message.
func (i InfoReservedByte) IsMessageStart() bool {
return uint(i)%2 == 1
}
```

### Logic

#### celestia-core

1. Account for the new `InfoReservedByte` in `./types/share_splitting.go` and `./types/share_merging.go`.
- **NOTE**: These files are subject to be deleted soon. See <https://github.com/celestiaorg/celestia-core/issues/842>

#### celestia-app

1. Account for the new `InfoReservedByte` in all share splitting and merging code. There is an in-progress refactor of the relevant files. See <https://github.com/celestiaorg/celestia-app/pull/637>

## Status

Proposed

## Consequences

### Positive

This proposal resolves challenges posed above.

### Negative

This proposal reduces the number of bytes a share can use for data by one byte.

### Neutral

If 127 versions is larger than required, the share format can be updated (in a subsequent version) to reserve fewer bits for the version in order to use some bits for other purposes.

If 127 versions is smaller than required, the share format can be updated (in a subsequent version) to occupy multiple bytes for the version. For example if the 7 bits are `1111111` then read an additional byte.

## References

- <https://github.com/celestiaorg/celestia-core/issues/839>
- <https://github.com/celestiaorg/celestia-core/issues/759>
- <https://github.com/celestiaorg/celestia-core/issues/757>
- <https://github.com/celestiaorg/celestia-app/issues/659>