Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify sync protocol and update to calculate optimistic heads #2746

Merged
merged 15 commits into from
Dec 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions presets/mainnet/altair.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ EPOCHS_PER_SYNC_COMMITTEE_PERIOD: 256
# ---------------------------------------------------------------
# 1
MIN_SYNC_COMMITTEE_PARTICIPANTS: 1
# SLOTS_PER_EPOCH * EPOCHS_PER_SYNC_COMMITTEE_PERIOD (= 32 * 256)
UPDATE_TIMEOUT: 8192
2 changes: 2 additions & 0 deletions presets/minimal/altair.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ EPOCHS_PER_SYNC_COMMITTEE_PERIOD: 8
# ---------------------------------------------------------------
# 1
MIN_SYNC_COMMITTEE_PARTICIPANTS: 1
# SLOTS_PER_EPOCH * EPOCHS_PER_SYNC_COMMITTEE_PERIOD (= 8 * 8)
UPDATE_TIMEOUT: 64
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -683,6 +683,7 @@ def combine_dicts(old_dict: Dict[str, T], new_dict: Dict[str, T]) -> Dict[str, T
'uint8', 'uint16', 'uint32', 'uint64', 'uint128', 'uint256',
'bytes', 'byte', 'ByteList', 'ByteVector',
'Dict', 'dict', 'field', 'ceillog2', 'floorlog2', 'Set',
'Optional',
]


Expand Down
199 changes: 131 additions & 68 deletions specs/altair/sync-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@
- [Preset](#preset)
- [Misc](#misc)
- [Containers](#containers)
- [`LightClientSnapshot`](#lightclientsnapshot)
- [`LightClientUpdate`](#lightclientupdate)
- [`LightClientStore`](#lightclientstore)
- [Helper functions](#helper-functions)
- [`get_subtree_index`](#get_subtree_index)
- [`get_active_header`](#get_active_header)
- [`get_safety_threshold`](#get_safety_threshold)
- [Light client state updates](#light-client-state-updates)
- [`process_slot_for_light_client_store`](#process_slot_for_light_client_store)
- [`validate_light_client_update`](#validate_light_client_update)
- [`apply_light_client_update`](#apply_light_client_update)
- [`process_light_client_update`](#process_light_client_update)
Expand Down Expand Up @@ -47,38 +49,27 @@ uses sync committees introduced in [this beacon chain extension](./beacon-chain.

### Misc

| Name | Value |
| - | - |
| `MIN_SYNC_COMMITTEE_PARTICIPANTS` | `1` |
| Name | Value | Notes |
| - | - | - |
| `MIN_SYNC_COMMITTEE_PARTICIPANTS` | `1` | |
| `UPDATE_TIMEOUT` | `SLOTS_PER_EPOCH * EPOCHS_PER_SYNC_COMMITTEE_PERIOD` | ~27.3 hours |

## Containers

### `LightClientSnapshot`

```python
class LightClientSnapshot(Container):
# Beacon block header
header: BeaconBlockHeader
# Sync committees corresponding to the header
current_sync_committee: SyncCommittee
next_sync_committee: SyncCommittee
```

### `LightClientUpdate`

```python
class LightClientUpdate(Container):
# Update beacon block header
header: BeaconBlockHeader
# Next sync committee corresponding to the header
# The beacon block header that is attested to by the sync committee
attested_header: BeaconBlockHeader
vbuterin marked this conversation as resolved.
Show resolved Hide resolved
# Next sync committee corresponding to the active header
next_sync_committee: SyncCommittee
next_sync_committee_branch: Vector[Bytes32, floorlog2(NEXT_SYNC_COMMITTEE_INDEX)]
# Finality proof for the update header
finality_header: BeaconBlockHeader
# The finalized beacon block header attested to by Merkle branch
finalized_header: BeaconBlockHeader
finality_branch: Vector[Bytes32, floorlog2(FINALIZED_ROOT_INDEX)]
# Sync committee aggregate signature
sync_committee_bits: Bitvector[SYNC_COMMITTEE_SIZE]
sync_committee_signature: BLSSignature
sync_committee_aggregate: SyncAggregate
# Fork version for the aggregate signature
fork_version: Version
```
Expand All @@ -88,8 +79,18 @@ class LightClientUpdate(Container):
```python
@dataclass
class LightClientStore(object):
snapshot: LightClientSnapshot
valid_updates: Set[LightClientUpdate]
# Beacon block header that is finalized
finalized_header: BeaconBlockHeader
# Sync committees corresponding to the header
current_sync_committee: SyncCommittee
next_sync_committee: SyncCommittee
# Best available header to switch finalized head to if we see nothing else
best_valid_update: Optional[LightClientUpdate]
# Most recent available reasonably-safe header
optimistic_header: BeaconBlockHeader
# Max number of active participants in a sync committee (used to calculate safety threshold)
previous_max_active_participants: uint64
current_max_active_participants: uint64
```

## Helper functions
Expand All @@ -101,95 +102,157 @@ def get_subtree_index(generalized_index: GeneralizedIndex) -> uint64:
return uint64(generalized_index % 2**(floorlog2(generalized_index)))
```

### `get_active_header`

```python
def get_active_header(update: LightClientUpdate) -> BeaconBlockHeader:
# The "active header" is the header that the update is trying to convince us
# to accept. If a finalized header is present, it's the finalized header,
# otherwise it's the attested header
if update.finalized_header != BeaconBlockHeader():
return update.finalized_header
else:
return update.attested_header
```

### `get_safety_threshold`

```python
def get_safety_threshold(store: LightClientStore) -> uint64:
return max(
store.previous_max_active_participants,
store.current_max_active_participants,
) // 2
```

## Light client state updates

A light client maintains its state in a `store` object of type `LightClientStore` and receives `update` objects of type `LightClientUpdate`. Every `update` triggers `process_light_client_update(store, update, current_slot)` where `current_slot` is the current slot based on some local clock.
A light client maintains its state in a `store` object of type `LightClientStore` and receives `update` objects of type `LightClientUpdate`. Every `update` triggers `process_light_client_update(store, update, current_slot)` where `current_slot` is the current slot based on some local clock. `process_slot_for_light_client_store` is processed every time the current slot increments.

#### `process_slot_for_light_client_store`

```python
def process_slot_for_light_client_store(store: LightClientStore, current_slot: Slot) -> None:
if current_slot % UPDATE_TIMEOUT == 0:
store.previous_max_active_participants = store.current_max_active_participants
store.current_max_active_participants = 0
if (
current_slot > store.finalized_header.slot + UPDATE_TIMEOUT
and store.best_valid_update is not None
):
# Forced best update when the update timeout has elapsed
apply_light_client_update(store, store.best_valid_update)
store.best_valid_update = None
```

#### `validate_light_client_update`

```python
def validate_light_client_update(snapshot: LightClientSnapshot,
def validate_light_client_update(store: LightClientStore,
update: LightClientUpdate,
current_slot: Slot,
genesis_validators_root: Root) -> None:
# Verify update slot is larger than snapshot slot
assert update.header.slot > snapshot.header.slot
# Verify update slot is larger than slot of current best finalized header
active_header = get_active_header(update)
assert current_slot >= active_header.slot > store.finalized_header.slot

# Verify update does not skip a sync committee period
snapshot_period = compute_epoch_at_slot(snapshot.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
update_period = compute_epoch_at_slot(update.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
assert update_period in (snapshot_period, snapshot_period + 1)
finalized_period = compute_epoch_at_slot(store.finalized_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean the light-client should fail if there is a skip period? This seems to be a fairly normal path when a client stops running for a few das.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The light-client can request historic LightClientUpdate from the network. It needs at least one update per period to follow along, as it only knows current_sync_committee and next_sync_committee and can only verify LightClientUpdate from those periods.

However, what is still suboptimal is the case where finalized_update is in a different period than attested_update, but this is not a problem introduced by this PR. Another tricky case to tackle for the case where attested_update is in a different period than the committee which signed it, which probably even requires some heuristics to figure out (as this case depends on there being missed slots at the start of an epoch). For now, these edge cases are all ignored, and the updates are only accepted if all of finalized_update, attested_update, and the sync committee signing it come from the same sync committee period.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. agree on the updating path will trace sync-comm linkage. Also agree that this is not the issue raised by this PR.

With regard to the edge case...it could cause some weird behaviors temporarily. For example, apply_light_client_update is called due to timeout. Then, when there are valid update.finalized_header arrives, they will get rejected.

Again, this behavior could be better handled if we assume that the light-client can make request for specific LightClientUpdates when it times out or fall out of sync with the current stream of updates. The sync logic would becomes a lot cleaner to be separated into two sync mode: skip-sync mode and normal sync mode.

update_period = compute_epoch_at_slot(active_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
assert update_period in (finalized_period, finalized_period + 1)

# Verify update header root is the finalized root of the finality header, if specified
if update.finality_header == BeaconBlockHeader():
signed_header = update.header
# Verify that the `finalized_header`, if present, actually is the finalized header saved in the
# state of the `attested header`
if update.finalized_header == BeaconBlockHeader():
assert update.finality_branch == [Bytes32() for _ in range(floorlog2(FINALIZED_ROOT_INDEX))]
else:
signed_header = update.finality_header
assert is_valid_merkle_branch(
leaf=hash_tree_root(update.header),
leaf=hash_tree_root(update.finalized_header),
branch=update.finality_branch,
depth=floorlog2(FINALIZED_ROOT_INDEX),
index=get_subtree_index(FINALIZED_ROOT_INDEX),
root=update.finality_header.state_root,
root=update.attested_header.state_root,
)

# Verify update next sync committee if the update period incremented
if update_period == snapshot_period:
sync_committee = snapshot.current_sync_committee
if update_period == finalized_period:
sync_committee = store.current_sync_committee
assert update.next_sync_committee_branch == [Bytes32() for _ in range(floorlog2(NEXT_SYNC_COMMITTEE_INDEX))]
else:
sync_committee = snapshot.next_sync_committee
sync_committee = store.next_sync_committee
assert is_valid_merkle_branch(
leaf=hash_tree_root(update.next_sync_committee),
branch=update.next_sync_committee_branch,
depth=floorlog2(NEXT_SYNC_COMMITTEE_INDEX),
index=get_subtree_index(NEXT_SYNC_COMMITTEE_INDEX),
root=update.header.state_root,
root=active_header.state_root,
)

sync_aggregate = update.sync_committee_aggregate

# Verify sync committee has sufficient participants
assert sum(update.sync_committee_bits) >= MIN_SYNC_COMMITTEE_PARTICIPANTS
assert sum(sync_aggregate.sync_committee_bits) >= MIN_SYNC_COMMITTEE_PARTICIPANTS

# Verify sync committee aggregate signature
participant_pubkeys = [pubkey for (bit, pubkey) in zip(update.sync_committee_bits, sync_committee.pubkeys) if bit]
participant_pubkeys = [
pubkey for (bit, pubkey) in zip(sync_aggregate.sync_committee_bits, sync_committee.pubkeys)
if bit
]
domain = compute_domain(DOMAIN_SYNC_COMMITTEE, update.fork_version, genesis_validators_root)
signing_root = compute_signing_root(signed_header, domain)
assert bls.FastAggregateVerify(participant_pubkeys, signing_root, update.sync_committee_signature)
signing_root = compute_signing_root(update.attested_header, domain)
assert bls.FastAggregateVerify(participant_pubkeys, signing_root, sync_aggregate.sync_committee_signature)
```

#### `apply_light_client_update`

```python
def apply_light_client_update(snapshot: LightClientSnapshot, update: LightClientUpdate) -> None:
snapshot_period = compute_epoch_at_slot(snapshot.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
update_period = compute_epoch_at_slot(update.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
if update_period == snapshot_period + 1:
snapshot.current_sync_committee = snapshot.next_sync_committee
snapshot.next_sync_committee = update.next_sync_committee
snapshot.header = update.header
def apply_light_client_update(store: LightClientStore, update: LightClientUpdate) -> None:
active_header = get_active_header(update)
finalized_period = compute_epoch_at_slot(store.finalized_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
update_period = compute_epoch_at_slot(active_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
if update_period == finalized_period + 1:
store.current_sync_committee = store.next_sync_committee
store.next_sync_committee = update.next_sync_committee
store.finalized_header = active_header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the optimistic_header was older, I guess it should also be updated here (to finalized_header).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that it could be the case the store.finalized_header is not actually a finalized header, when the apply_light_client_update is called through update timeout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the case in the old version as well, but it was called just header there. finalized_header here seems to have a different meaning than in other contexts, it's just finalized for the light client (it won't revert it anymore). Agree that the naming is suboptimal. Likewise, the optimistic_header also seems to have a different meaning from the one discussed as part of the merge effort.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...if this is the intended "finalization" for the light-client, that is not great.

In the case of timeout, why not just go to the network and ask for a committee changing update? I know that in this spec, we have not specify how to get that information. In any implementation, the light client is going to have to be able to ask for historic updates corresponding to some "sync-committee". If that is available, the finalization of just taking the store.best_valid_update is not great. I doubt that real client implementation is going to take this route.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sync committee participation is low, and none of the blocks exceeds the 2/3 majority for a day, there still needs to be a way to proceed though. Not sure how realistic that is for mainnet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is fine. If that indeed happens once in blue moon, the light client would stop working syncing. The manual fix for light client operator is to use a newly acquired, trusted starting point. The code owner could also update their client's hard coded starting point. In a way, these manual interventions should be considered desirable because we have unexpected level of participation.

However, if that happens a lot, I think that is more of an incentive design issue. We should consider how to fix that at the protocol level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light clients are intended to be able to follow the chain in as similar a way to regular clients as possible. And one of the ethereum staking protocol's core design goals has all along been to have some path to be able to continue making progress under >1/3 offline conditions. So the light client protocol should include some way to do that.

(I'm assuming light clients are going to be used in a lot of contexts, including automated ones, where manual intervention is hard and should be left to resolving 51% attacks)

What is a better alternative to taking store.best_valid_update? The regular ethereum protocol advances during the non-finalization case by using the LMD GHOST fork choice rule, which follows the chain that has the most validators supporting it. store.best_valid_update approximates that. Is there a better approximation?

Copy link

@jinfwhuang jinfwhuang Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that the light-client's ability to continue could just rely on the "data source", which are invariably backed by full nodes. The exact meaning of data source is not well defined yet because the networking layer could be portal-network, a LES-like p2p network, or a server-client RPC pairing.

When the light-client experiences a timeout or falls behind the current sync-comm, i.e. the incoming Updates are not good enough to advance its finalized_header, the client would revert to a skip-sync mode. In a skip-sync mode, the client asks the "data source" for an update that would advance its sync-committee. A light client does not advance until it somehow find a way to access "finality". Because finality is guaranteed to be found in some data sources, a light client is stuck because it couldn't access the correct data sources (i.e. correct updates).

The guarantee of a light client would find a way to advance should depends on a light client having a way to find the right updates. Again, networking is not defined yet; once it is defined, we can evaluate at what conditions the light-client might not be able to find the appropriate updates.

```

#### `process_light_client_update`

```python
def process_light_client_update(store: LightClientStore, update: LightClientUpdate, current_slot: Slot,
def process_light_client_update(store: LightClientStore,
update: LightClientUpdate,
current_slot: Slot,
genesis_validators_root: Root) -> None:
validate_light_client_update(store.snapshot, update, genesis_validators_root)
store.valid_updates.add(update)
validate_light_client_update(store, update, current_slot, genesis_validators_root)

update_timeout = SLOTS_PER_EPOCH * EPOCHS_PER_SYNC_COMMITTEE_PERIOD
sync_committee_bits = update.sync_committee_aggregate.sync_committee_bits

# Update the best update in case we have to force-update to it if the timeout elapses
if (
sum(update.sync_committee_bits) * 3 >= len(update.sync_committee_bits) * 2
and update.finality_header != BeaconBlockHeader()
store.best_valid_update is None
or sum(sync_committee_bits) > sum(store.best_valid_update.sync_committee_aggregate.sync_committee_bits)
):
# Apply update if (1) 2/3 quorum is reached and (2) we have a finality proof.
# Note that (2) means that the current light client design needs finality.
# It may be changed to re-organizable light client design. See the on-going issue consensus-specs#2182.
apply_light_client_update(store.snapshot, update)
store.valid_updates = set()
elif current_slot > store.snapshot.header.slot + update_timeout:
# Forced best update when the update timeout has elapsed
apply_light_client_update(store.snapshot,
max(store.valid_updates, key=lambda update: sum(update.sync_committee_bits)))
store.valid_updates = set()
store.best_valid_update = update

# Track the maximum number of active participants in the committee signatures
store.current_max_active_participants = max(
store.current_max_active_participants,
sum(sync_committee_bits),
)

# Update the optimistic header
if (
sum(sync_committee_bits) > get_safety_threshold(store)
and update.attested_header.slot > store.optimistic_header.slot
):
store.optimistic_header = update.attested_header

# Update finalized header
if (
sum(sync_committee_bits) * 3 >= len(sync_committee_bits) * 2
and update.finalized_header != BeaconBlockHeader()
):
# Normal update through 2/3 threshold
apply_light_client_update(store, update)
store.best_valid_update = None
```
Loading