Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP 2124: Fork identifier for chain compatibility checks #2124

Merged
merged 13 commits into from
Aug 8, 2019
264 changes: 264 additions & 0 deletions EIPS/eip-2124.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
---
eip: 2124
title: Zero RTT chain compatibility check
author: Péter Szilágyi <peterke@gmail.com>
discussions-to: https://github.com/ethereum/EIPs/issues/2125
status: Draft
type: Standards Track
category: Networking
requires: 778
created: 2019-05-03
---

axic marked this conversation as resolved.
Show resolved Hide resolved
## Simple Summary

Currently nodes in the Ethereum network try to find each other by establishing random connections to remote machines "looking" like an Ethereum node (public networks, private networks, test networks, cloned networks, etc) and hoping for the best that they found a useful peer. This procedure of shooting in the dark is both time consuming and wasteful; and could be compared to connecting to random websites hoping they are Google. This EIP proposes an extension to the discovery protocol, to allow deciding in advance if a remote machine is useful, without ever having to connect.

## Abstract

There are many public and private Ethereum networks, but the discovery protocol doesn't differentiate between them. The only way to check if a peer is good or bad (same chain or not), is to establish a TCP/IP connection, wrap it with RLPx cryptography, then execute an `eth` handshake. This is an extreme cost to bear if it turns out that the remote peer is on a different network. This cost is magnified for small networks, where a lot more trial and errors are needed to find good nodes.

Even if the peer **is** on the same network, during non-controversial consensus upgrades, not everybody updates their nodes in time (developer nodes, leftovers, etc). These stale nodes put a meaningless burden on the peer-to-peer network, since they just latch on to good nodes, but don't accept upgraded blocks. This causes valuable peer slots and bandwidth to be lost until the stale nodes finally update. This is an even more pronounced issue for test networks, where leftover nodes can linger for many months.

This EIP proposes an enhancement to the [Ethereum Node Record (ENR)](http://eips.ethereum.org/EIPS/eip-778) extension of the discovery protocol to detect when two nodes are incompatible and never connect them in the first place, instead of wasting resources in vain. The EIP solves a a number of issues:

* If two nodes are on different networks, they should never even consider connecting.
* If a hard fork passes, upgraded nodes should reject non-upgraded ones, but **NOT** before.
* If two chains share the same genesis, but not forks (ETH / ETC), they should reject each other.
* Ideally the rejection should be short circuited during discovery, before the expensive RLPx handshakes.

This EIP does not attempt to solve the clean separation of 3-way-forks! If at the same future block number, the network splits into three (non-fork, fork-A and fork-B), separating the forkers from each another will need case-by-case special handling. Not handling this keeps the proposal pragmatic, simple and also avoids making it too easy to fork off mainnet.

## Motivation

Peer-to-peer networking is messy and hard due to firewalls and network address translations. Generally only a small fraction of nodes have publicly routed addresses, and P2P networks rely mainly on these for relaying data for everyone else. The best way to maximize the utility of the public nodes is by ensuring their resources aren't wasted on tasks that are worthless to the network.

By aggressively cutting off incompatible nodes from each other we can extract a lot more value from the public nodes, making the entire P2P network much more robust and reliable. Supporting this network partitioning at a discovery layer can further enhance performance as we avoid the costly crypto and latency/bandwidth hit associated with establishing a stream connection in the first place.

## Specification

Each node maintains the following values:

- **`GENESIS_CHECKSUM`**: Bitwise XOR of the genesis hash split into 4-byte chunks (4 bytes).
- **`FORK_CHECKSUM`**: Bitwise XOR of all fork block numbers that the local chain already passed (4 bytes).
- If multiple forks are applied at the same block, the block number should be XOR-ed only once.
- Block numbers are truncated to `uint32` and encoded in big endian format before XOR-ing.
- **`FORK_NEXT`**: Block number of the next upcoming fork (4 bytes, `0x00000000` if no next fork is known).
- The upcoming fork block number is truncated to `uint32` and encoded in big endian format.

When advertising the local chain to a remote node (ENR during discovery), each node shares its own `RLP(GENESIS_CHECKSUM || FORK_CHECKSUM || FORK_NEXT)` under the `eth` ENR key. These are cross validated (**NOT** compared) to accept or reject connectivity. Both parties must come to the same conclusion to avoid indefinite reconnect attempts from one side.

#### Validation rules

0) If the local and remote `GENESIS_CHECKSUM` doesn't match, reject.
1) If local and remote `FORK_CHECKSUM` matches, connect.
- The two nodes are in the same fork state currently. They might know of differing future forks, but that's not relevant until the fork triggers (might be postponed, nodes might be updated to match).
2) If the remote `FORK_CHECKSUM` is a subset of the local past forks and the remote `FORK_NEXT` matches with the locally following fork block number, connect.
- Remote node is currently syncing. It might eventually diverge from us, but at this current point in time we don't have enough information.
3) If the remote `FORK_CHECKSUM` is a superset of the local past forks and can be completed with locally known future forks, connect.
- Local node is currently syncing. It might eventually diverge from the remote, but at this current point in time we don't have enough information.
4) Reject in all other cases.

#### Stale software examples

The examples below try to exhaust the fork combination possibilities that arise when nodes do not run matching software versions, but otherwise follow the same chain (mainnet nodes, testnet nodes, etc).

| Past forks | Future forks | Remote `FORK_CHECKSUM` | Remote `FORK_NEXT` | Connect | Reason |
|:---:|:---:|:---:|:---:|:---:|:---:|
| A | | A | | Yes (1) | Same forks, same sync state. |
| A | | A | B | Yes (1) | Remote is advertising a future fork, but that is uncertain. |
| A | B | A | | Yes (1) | Local knows about a future fork, but that is uncertain. |
| A | B | A | B | Yes (1) | Both know about a future fork, but that is uncertain. |
| A | B1 | A | B2 | Yes (1) | Both know about differing future forks, but those are uncertain. |
| [A,B] | | A | B | Yes (2) | Remote out of sync. |
| [A,B,C] | | A | B | Yes¹ (2) | Remote out of sync. Remote will need a software update, but we don't know it yet. |
| A | B | A ⊕ B | | Yes (3) | Local out of sync. |
| A | B,C | A ⊕ B | | Yes (3) | Local out of sync. Local also knows about a future fork, but that is uncertain yet. |
| A | | A ⊕ B | | No (4) | Local needs software update. |
| A | B | A ⊕ B ⊕ C | | No² (4) | Local needs software update. |
| [A,B] | | A | | No (4) | Remote needs software update. |

*Note, there's one asymmetry in the table, marked with ¹ and ². Since we don't have access to a remote node's future fork list (just the next one), we can't detect that it's software is stale until it syncs up. This is acceptable as 1) the remote node will disconnect from us anyway, and 2) this is a temporary fluke during sync, not permanent with a leftover node.*

#### Mismatching chain examples (local perspective)

TODO: Give some examples as to what happens if the nodes follow different forks.

## Rationale

##### Why flatten the genesis into 4 bytes? Why not share the entire 32 bytes?

Whilst the `eth` devp2p protocol permits arbitrarily much data to be transmitted, the discovery protocol relies on MTU-limited UDP messages. The total space allowance for all ENR entries is 300 bytes.
karalabe marked this conversation as resolved.
Show resolved Hide resolved

karalabe marked this conversation as resolved.
Show resolved Hide resolved
Reducing the genesis into a 4 bytes checksum ensures that we leave ample room in the ENR for future extensions; and 4 bytes is more than enough for arbitrarilly many Ethereum networks from a collision perspective.

##### Why flatten the fork block numbers into 4 bytes? Why not share as a list?

Whilst the `eth` devp2p protocol permits arbitrarily much data to be transmitted, the discovery protocol relies on MTU-limited UDP messages. The total space allowance for all ENR entries is 300 bytes.

Flattening the list of fork blocks into a single value ensures that independent of how many hard-forks Ethereum progresses through, we will never hit the discovery protocol limits.

##### We're not using `FORK_NEXT` for much, can't we get rid of it somehow?

We need to be able to differentiate whether a remote node is out of sync or whether its software is stale. Sharing only the past forks cannot tell us if the node is legitimately behind or stuck.

##### Why advertise only one next fork, instead of hashing all known future ones?

Opposed to past forks that have already passed (for us locally) and can be considered immutable, we don't know anything about future ones. Maybe we're out of sync or maybe the fork didn't pass yet. If it didn't pass yet, it might be postponed, so enforcing it would split the network apart. It could also happen that we're not yet aware of all future forks (haven't updated our software in a while).

## Backwards Compatibility

- Consensus wise, this EIP is backwards compatible as it only touches networking.
- Discovery protocol wise ENR is not yet supported across the ecosystem. These restrictions are only enforced between nodes supporting ENR and also advertising the fork fields. Nodes without ENR support, without the necessary fields, or mixed nodes will simply revert to enforcing the network split on the `eth` protocol layer.

## Test Cases

Here's a full suite of tests for all possible ENR entries that Mainnet, Ropsten, Rinkeby and Görli can advertise given the Petersburg fork cap (time of writing).

```go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should find a format for the test cases that isn't Go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. The Go code is table driven, so it's fairly trivial to copy paste. Whether it's a Go "table" or a markdown table, it makes no difference. You still need to copy paste each cell individually.

From my perspective as an author however, I don't have to screw around with converting Go tables to markdown tables, and then when something changes (like XOR to CRC), I don't have to throw all that work out and redo everything all over.

I.e. It saves time for me, it's more robust against change, and it doesn't matter to anyone else.

type testcase struct {
head uint64
want ENR
}
tests := []struct {
config *params.ChainConfig
genesis common.Hash
cases []testcase
}{
// Mainnet test cases
{
params.MainnetChainConfig,
params.MainnetGenesisHash,
[]testcase{
{0, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0x8c, 0x30}}, // Unsynced
{1149999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0x8c, 0x30}}, // Last Frontier block
{1150000, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x11, 0x8c, 0x30, 0x00, 0x1d, 0x4c, 0x00}}, // First Homestead block
{1919999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x11, 0x8c, 0x30, 0x00, 0x1d, 0x4c, 0x00}}, // Last Homestead block
{1920000, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x0c, 0xc0, 0x30, 0x00, 0x25, 0x95, 0x18}}, // First DAO block
{2462999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x0c, 0xc0, 0x30, 0x00, 0x25, 0x95, 0x18}}, // Last DAO block
{2463000, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x29, 0x55, 0x28, 0x00, 0x28, 0xd1, 0x38}}, // First Tangerine block
{2674999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x29, 0x55, 0x28, 0x00, 0x28, 0xd1, 0x38}}, // Last Tangerine block
{2675000, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x01, 0x84, 0x10, 0x00, 0x42, 0xae, 0x50}}, // First Spurious block
{4369999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x01, 0x84, 0x10, 0x00, 0x42, 0xae, 0x50}}, // Last Spurious block
{4370000, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0x00, 0x6f, 0x15, 0x80}}, // First Byzantium block
{7279999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0x00, 0x6f, 0x15, 0x80}}, // Last Byzantium block
{7280000, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x2c, 0x3f, 0xc0, 0x00, 0x00, 0x00, 0x00}}, // First and last Constantinople, first Petersburg block
{7987396, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x2c, 0x3f, 0xc0, 0x00, 0x00, 0x00, 0x00}}, // Today Petersburg block
},
},
// Ropsten test cases
{
params.TestnetChainConfig,
params.TestnetGenesisHash,
[]testcase{
{0, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a}}, // Unsynced, last Frontier, Homestead and first Tangerine block
{9, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a}}, // Last Tangerine block
{10, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x19, 0xf0, 0xa0}}, // First Spurious block
{1699999, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x19, 0xf0, 0xa0}}, // Last Spurious block
{1700000, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x19, 0xf0, 0xaa, 0x00, 0x40, 0x8b, 0x70}}, // First Byzantium block
{4229999, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x19, 0xf0, 0xaa, 0x00, 0x40, 0x8b, 0x70}}, // Last Byzantium block
{4230000, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x59, 0x7b, 0xda, 0x00, 0x4b, 0x5e, 0x82}}, // First Constantinople block
{4939393, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x59, 0x7b, 0xda, 0x00, 0x4b, 0x5e, 0x82}}, // Last Constantinople block
{4939394, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x12, 0x25, 0x58, 0x00, 0x00, 0x00, 0x00}}, // First Petersburg block
{5822692, ENR{0xe4, 0x91, 0x9a, 0xeb, 0x00, 0x12, 0x25, 0x58, 0x00, 0x00, 0x00, 0x00}}, // Today Petersburg block
},
},
// Rinkeby test cases
{
params.RinkebyChainConfig,
params.RinkebyGenesisHash,
[]testcase{
{0, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01}}, // Unsynced, last Frontier block
{1, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02}}, // First and last Homestead block
{2, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x03}}, // First and last Tangerine block
{3, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0f, 0xcc, 0x25}}, // First Spurious block
{1035300, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0f, 0xcc, 0x25}}, // Last Spurious block
{1035301, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x0f, 0xcc, 0x25, 0x00, 0x37, 0xdb, 0x77}}, // First Byzantium block
{3660662, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x0f, 0xcc, 0x25, 0x00, 0x37, 0xdb, 0x77}}, // Last Byzantium block
{3660663, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x38, 0x17, 0x52, 0x00, 0x41, 0xef, 0xd2}}, // First Constantinople block
{4321233, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x38, 0x17, 0x52, 0x00, 0x41, 0xef, 0xd2}}, // Last Constantinople block
{4321234, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x79, 0xf8, 0x80, 0x00, 0x00, 0x00, 0x00}}, // First Petersburg block
{4586649, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x79, 0xf8, 0x80, 0x00, 0x00, 0x00, 0x00}}, // Today Petersburg block
},
},
// Goerli test cases
{
params.GoerliChainConfig,
params.GoerliGenesisHash,
[]testcase{
{0, ENR{0xc7, 0x24, 0x73, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}}, // Unsynced, last Frontier, Homestead, Tangerine, Spurious, Byzantium, Constantinople and first Petersburg block
{795329, ENR{0xc7, 0x24, 0x73, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}}, // Today Petersburg block
},
},
}
```

Here's a suite of tests of the different states a Mainnet node might be in and the different ENR announcements it might be required to validate and decide to accept or reject:

```go
tests := []struct {
head uint64
enr ENR
err error
}{
// Local is mainnet Petersburg, remote announces the same. No future fork is announced.
{7987396, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x2c, 0x3f, 0xc0, 0x00, 0x00, 0x00, 0x00}, nil},

// Local is mainnet Petersburg, remote announces the same. Remote also announces a next fork
// at block 0xffffffff, but that is uncertain.
{7987396, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x2c, 0x3f, 0xc0, 0xff, 0xff, 0xff, 0xff}, nil},

// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, but it's not yet aware of Petersburg (e.g. non updated node before the fork).
// In this case we don't know if Petersburg passed yet or not.
{7279999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0x00, 0x00, 0x00, 0x00}, nil},

// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, and it's also aware of Petersburg (e.g. updated node before the fork). We
// don't know if Petersburg passed yet (will pass) or not.
{7279999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0x00, 0x6f, 0x15, 0x80}, nil},

// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, and it's also aware of some random fork (e.g. misconfigured Petersburg). As
// neither forks passed at neither nodes, they may mismatch, but we still connect for now.
{7279999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0xff, 0xff, 0xff, 0xff}, nil},

// Local is mainnet Petersburg, remote announces Byzantium + knowledge about Petersburg. Remote
// is simply out of sync, accept.
{7987396, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0x00, 0x6f, 0x15, 0x80}, nil},

// Local is mainnet Petersburg, remote announces Spurious + knowledge about Byzantium. Remote
// is definitely out of sync. It may or may not need the Petersburg update, we don't know yet.
{7987396, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x01, 0x84, 0x10, 0x00, 0x42, 0xae, 0x50}, nil},

// Local is mainnet Byzantium, remote announces Petersburg. Local is out of sync, accept.
{7279999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x2c, 0x3f, 0xc0, 0x00, 0x00, 0x00, 0x00}, nil},

// Local is mainnet Spurious, remote announces Byzantium, but is not aware of Petersburg. Local
// out of sync. Local also knows about a future fork, but that is uncertain yet.
{4369999, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0x00, 0x00, 0x00, 0x00}, nil},

// Local is mainnet Petersburg, and isn't aware of more forks. Remote announces Petersburg +
// 0xffffffff. Local needs software update, reject.
{7987396, ENR{0x52, 0xba, 0xab, 0x2d, 0xff, 0xd3, 0xc0, 0x3f, 0x00, 0x00, 0x00, 0x00}, errENRLocalStale},

// Local is mainnet Byzantium, and is aware of Petersburg. Remote announces Petersburg +
// 0xffffffff. Local needs software update, reject.
{7279999, ENR{0x52, 0xba, 0xab, 0x2d, 0xff, 0xd3, 0xc0, 0x3f, 0x00, 0x00, 0x00, 0x00}, errENRLocalStale},

// Local is mainnet Petersburg. remote announces Byzantium but is not aware of further forks.
// Remote needs software update.
{7987396, ENR{0x52, 0xba, 0xab, 0x2d, 0x00, 0x43, 0x2a, 0x40, 0x00, 0x00, 0x00, 0x00}, errENRRemoteStale},

// Local is mainnet Petersburg, remote is Rinkeby Petersburg.
{7987396, ENR{0xca, 0xb4, 0xc3, 0x84, 0x00, 0x79, 0xf8, 0x80, 0x00, 0x00, 0x00, 0x00}, errENRGenesisMismatch},
}
```

## Implementation

https://github.com/ethereum/go-ethereum/pull/19738

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).