Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP 2124: Fork identifier for chain compatibility checks #2124

Merged
merged 13 commits into from
Aug 8, 2019
294 changes: 294 additions & 0 deletions EIPS/eip-2124.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
---
eip: 2124
title: Fork identifier for chain compatibility checks
author: Péter Szilágyi <peterke@gmail.com>, Felix Lange <fjl@ethereum.org>
discussions-to: https://github.com/ethereum/EIPs/issues/2125
status: Draft
type: Standards Track
category: Networking
created: 2019-05-03
---

axic marked this conversation as resolved.
Show resolved Hide resolved
## Simple Summary

Currently nodes in the Ethereum network try to find each other by establishing random connections to remote machines "looking" like an Ethereum node (public networks, private networks, test networks, etc), hoping that they found a useful peer (same genesis, same forks). This wastes time and resources, especially for smaller networks.

To avoid this overhead, Ethereum needs a mechanism that can precisely identify whether a node will be useful, as early as possible. Such a mechanism requires a way to summarize chain configurations, as well as a way to disseminate said summaries in the network.

This proposal focuses only on the definition of said summary - a generally useful *fork identifier* - and it's validation rules, allowing it to be embedded into arbitrary network protocols (e.g. [discovery ENRs](https://eips.ethereum.org/EIPS/eip-778) or `eth/6x` handshakes).

## Abstract

There are many public and private Ethereum networks, but the discovery protocol doesn't differentiate between them. The only way to check if a peer is good or bad (same chain or not), is to establish a TCP/IP connection, wrap it with RLPx cryptography, then execute an `eth` handshake. This is an extreme cost to bear if it turns out that the remote peer is on a different network and it's not even precise enough to differentiate Ethereum and Ethereum Classic. This cost is magnified for small networks, where a lot more trial and errors are needed to find good nodes.

Even if the peer **is** on the same chain, during non-controversial consensus upgrades, not everybody updates their nodes in time (developer nodes, leftovers, etc). These stale nodes put a meaningless burden on the peer-to-peer network, since they just latch on to good nodes, but don't accept upgraded blocks. This causes valuable peer slots and bandwidth to be lost until the stale nodes finally update. This is a serious issue for test networks, where leftovers can linger for months.

This EIP proposes a new identity scheme to both precisely and concisely summarize the chain's current status (genesis and all applied forks). The conciseness is particularly important to make the identity useful across datagram protocols too. The EIP solves a number of issues:

* If two nodes are on different networks, they should never even consider connecting.
* If a hard fork passes, upgraded nodes should reject non-upgraded ones, but **NOT** before.
* If two chains share the same genesis, but not forks (ETH / ETC), they should reject each other.

This EIP does not attempt to solve the clean separation of 3-way-forks! If at the same future block number, the network splits into three (non-fork, fork-A and fork-B), separating the forkers from each another will need case-by-case special handling. Not handling this keeps the proposal pragmatic, simple and also avoids making it too easy to fork off mainnet.

To keep the scope limited, this EIP only defines the identity scheme and validation rules. The same scheme and algorithm can be embedded into various networking protocols, allowing both the `eth/6x` handshake to be more precise (Ethereum vs. Ethereum Classic); as well as the discovery to be more useful (reject surely peers without ever connecting).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"only defines the identity scheme..." is confusing because ENR also has 'identity schemes' which are unrelated.
Maybe best to just delete the entire paragraph TBH.

Gotta say the whole Abstract could be removed and some of the content moved into Motivation + Rationale where it really belongs.

Copy link
Member Author

@karalabe karalabe Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the EIP process requires these 3 completely redundant pieces to exist, and @axic requested that I explicitly have all of them. My original EIP didn't have them. It's stupid to have them. If I've however been requested to have them, I'm not deleting them any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"only defines the identity scheme..." is confusing because ENR also has 'identity schemes' which are unrelated.

Well, this EIP is not about ENR per your request, so you should not know about ENR identity schemes at all (as a matter of fact, I've no clue what they are :D).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the EIP process requires these 3 completely redundant pieces to exist, and @axic requested that I explicitly have all of them.

Don't blame it on me, I'm just following EIP-1, hehe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the intention of EIP-1 is to make you repeat the same text twice with tweaked wording just so you have both sections. Maybe EIP-1 should be changed. Honestly, "simple summary" is the definition of Abstract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karalabe I don't understand why the fork ID is an 'identity scheme'. This sentence is the only place where 'identity' appears in the whole document. An 'identity scheme' is a way to assign identities, i.e. on a national identity card. In ENR, an identity scheme is the name of a signing function and public key crypto parameters that define how to assign a node identity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, id in forkid means identity :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it means identifier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, the forkid is an identifier, but doesn't the EIP define the rules (i.e. scheme) of how the ID is generated and regenerated over time? Maybe I have a different mental definition of "scheme". Anyway, I don't much care about the exact wording if you have a better alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said I'd just kill the entire paragraph.


## Motivation

Peer-to-peer networking is messy and hard due to firewalls and network address translation (NAT). Generally only a small fraction of nodes have publicly routed addresses and P2P networks rely mainly on these for forwarding data for everyone else. The best way to maximize the utility of the public nodes is to ensure their resources aren't wasted on tasks that are worthless to the network.

By aggressively cutting off incompatible nodes from each other we can extract a lot more value from the public nodes, making the entire P2P network much more robust and reliable. Supporting this network partitioning at a discovery layer can further enhance performance as we avoid the costly crypto and latency/bandwidth hit associated with establishing a stream connection in the first place.

## Specification

Each node maintains the following values:

- **`FORK_HASH`**: IEEE CRC32 checksum (`[4]byte`) of the genesis hash and fork blocks numbers that already passed.
- The fork block numbers are fed into the CRC32 checksum in ascending order.
- If multiple forks are applied at the same block, the block number is checksummed only once.
- Block numbers are regarded as `uint64` integers, encoded in big endian format when checksumming.
- If a chain is configured to start with a non-Frontier ruleset already in its genesis, that is NOT considered a fork.
- **`FORK_NEXT`**: Block number (`uint64`) of the next upcoming fork, or `0` if no next fork is known.

E.g. `FORK_HASH` for mainnet would be:

- forkhash₀ = `0xfc64ec04` (Genesis) = `CRC32(<genesis-hash>)`
- forkhash₁ = `0x97c2c34c` (Homestead) = `CRC32(<genesis-hash> || uint64(1150000))`
- forkhash₂ = `0x91d1f948` (DAO fork) = `CRC32(<genesis-hash> || uint64(1150000) || uint64(1920000))`

The *fork identifier* is defined as `RLP([FORK_HASH, FORK_NEXT])`. This `forkid` is cross validated (**NOT** naively compared) to assess a remote chain's compatibility. Irrespective of fork state, both parties must come to the same conclusion to avoid indefinite reconnect attempts from one side.

#### Validation rules

1) If local and remote `FORK_HASH` matches, connect.
- The two nodes are in the same fork state currently. They might know of differing future forks, but that's not relevant until the fork triggers (might be postponed, nodes might be updated to match).
2) If the remote `FORK_HASH` is a subset of the local past forks and the remote `FORK_NEXT` matches with the locally following fork block number, connect.
- Remote node is currently syncing. It might eventually diverge from us, but at this current point in time we don't have enough information.
3) If the remote `FORK_HASH` is a superset of the local past forks and can be completed with locally known future forks, connect.
- Local node is currently syncing. It might eventually diverge from the remote, but at this current point in time we don't have enough information.
4) Reject in all other cases.

#### Stale software examples

The examples below try to exhaust the fork combination possibilities that arise when nodes do not run matching software versions, but otherwise follow the same chain (mainnet nodes, testnet nodes, etc).

| Past forks | Future forks | Remote `FORK_HASH` | Remote `FORK_NEXT` | Connect | Reason |
|:---:|:---:|:---:|:---:|:---:|:---:|
| A | | A | | Yes (1) | Same forks, same sync state. |
| A | | A | B | Yes (1) | Remote is advertising a future fork, but that is uncertain. |
| A | B | A | | Yes (1) | Local knows about a future fork, but that is uncertain. |
| A | B | A | B | Yes (1) | Both know about a future fork, but that is uncertain. |
| A | B1 | A | B2 | Yes (1) | Both know about differing future forks, but those are uncertain. |
| [A,B] | | A | B | Yes (2) | Remote out of sync. |
| [A,B,C] | | A | B | Yes¹ (2) | Remote out of sync. Remote will need a software update, but we don't know it yet. |
| A | B | A ⊕ B | | Yes (3) | Local out of sync. |
| A | B,C | A ⊕ B | | Yes (3) | Local out of sync. Local also knows about a future fork, but that is uncertain yet. |
| A | | A ⊕ B | | No (4) | Local needs software update. |
| A | B | A ⊕ B ⊕ C | | No² (4) | Local needs software update. |
| [A,B] | | A | | No (4) | Remote needs software update. |

*Note, there's one asymmetry in the table, marked with ¹ and ². Since we don't have access to a remote node's future fork list (just the next one), we can't detect that it's software is stale until it syncs up. This is acceptable as 1) the remote node will disconnect from us anyway, and 2) this is a temporary fluke during sync, not permanent with a leftover node.*

#### Mismatching chain examples (local perspective)

TODO: Give some examples as to what happens if the nodes follow different forks.

## Rationale

##### Why flatten `FORK_HASH` into 4 bytes? Why not share the entire genesis and fork list?

Whilst the `eth` devp2p protocol permits arbitrarily much data to be transmitted, the discovery protocol's total space allowance for all ENR entries is 300 bytes.

karalabe marked this conversation as resolved.
Show resolved Hide resolved
Reducing the `FORK_HASH` into a 4 bytes checksum ensures that we leave ample room in the ENR for future extensions; and 4 bytes is more than enough for arbitrarily many Ethereum networks from a (practical) collision perspective.

##### Why use IEEE CRC32 as the checksum instead of Keccak256?

We need a mechanism that can flatten arbitrary data into 4 bytes, without ignoring any of the input. Any other checksum or hashing algorithm would work, but since nodes can lie at any time, there's no value in cryptographic hash functions.

Instead of just taking the first 4 bytes of a Keccak256 hash (seems odd) or XOR-ing all the 4-byte groups (messy), CRC32 is a better alternative, as this is exactly what it was designed for. IEEE CRC32 is also used by ethernet, gzip, zip, png, etc, so every programming language support should not be a problem.

##### We're not using `FORK_NEXT` for much, can't we get rid of it somehow?

We need to be able to differentiate whether a remote node is out of sync or whether its software is stale. Sharing only the past forks cannot tell us if the node is legitimately behind or stuck.

##### Why advertise only one next fork, instead of "hashing" all known future ones like the `FORK_HASH`?

Opposed to past forks that have already passed (for us locally) and can be considered immutable, we don't know anything about future ones. Maybe we're out of sync or maybe the fork didn't pass yet. If it didn't pass yet, it might be postponed, so enforcing it would split the network apart. It could also happen that we're not yet aware of all future forks (haven't updated our software in a while).

## Backwards Compatibility

This EIP only defines an identity scheme, it does not define functional changes.

## Test Cases

Here's a full suite of tests for all possible fork IDs that Mainnet, Ropsten, Rinkeby and Görli can advertise given the Petersburg fork cap (time of writing).

```go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should find a format for the test cases that isn't Go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. The Go code is table driven, so it's fairly trivial to copy paste. Whether it's a Go "table" or a markdown table, it makes no difference. You still need to copy paste each cell individually.

From my perspective as an author however, I don't have to screw around with converting Go tables to markdown tables, and then when something changes (like XOR to CRC), I don't have to throw all that work out and redo everything all over.

I.e. It saves time for me, it's more robust against change, and it doesn't matter to anyone else.

type testcase struct {
head uint64
want ID
}
tests := []struct {
config *params.ChainConfig
genesis common.Hash
cases []testcase
}{
// Mainnet test cases
{
params.MainnetChainConfig,
params.MainnetGenesisHash,
[]testcase{
{0, ID{Hash: 0xfc64ec04, Next: 1150000}}, // Unsynced
{1149999, ID{Hash: 0xfc64ec04, Next: 1150000}}, // Last Frontier block
{1150000, ID{Hash: 0x97c2c34c, Next: 1920000}}, // First Homestead block
{1919999, ID{Hash: 0x97c2c34c, Next: 1920000}}, // Last Homestead block
{1920000, ID{Hash: 0x91d1f948, Next: 2463000}}, // First DAO block
{2462999, ID{Hash: 0x91d1f948, Next: 2463000}}, // Last DAO block
{2463000, ID{Hash: 0x7a64da13, Next: 2675000}}, // First Tangerine block
{2674999, ID{Hash: 0x7a64da13, Next: 2675000}}, // Last Tangerine block
{2675000, ID{Hash: 0x3edd5b10, Next: 4370000}}, // First Spurious block
{4369999, ID{Hash: 0x3edd5b10, Next: 4370000}}, // Last Spurious block
{4370000, ID{Hash: 0xa00bc324, Next: 7280000}}, // First Byzantium block
{7279999, ID{Hash: 0xa00bc324, Next: 7280000}}, // Last Byzantium block
{7280000, ID{Hash: 0x668db0af, Next: 0}}, // First and last Constantinople, first Petersburg block
{7987396, ID{Hash: 0x668db0af, Next: 0}}, // Today Petersburg block
},
},
// Ropsten test cases
{
params.TestnetChainConfig,
params.TestnetGenesisHash,
[]testcase{
{0, ID{Hash: 0x30c7ddbc, Next: 10}}, // Unsynced, last Frontier, Homestead and first Tangerine block
{9, ID{Hash: 0x30c7ddbc, Next: 10}}, // Last Tangerine block
{10, ID{Hash: 0x63760190, Next: 1700000}}, // First Spurious block
{1699999, ID{Hash: 0x63760190, Next: 1700000}}, // Last Spurious block
{1700000, ID{Hash: 0x3ea159c7, Next: 4230000}}, // First Byzantium block
{4229999, ID{Hash: 0x3ea159c7, Next: 4230000}}, // Last Byzantium block
{4230000, ID{Hash: 0x97b544f3, Next: 4939394}}, // First Constantinople block
{4939393, ID{Hash: 0x97b544f3, Next: 4939394}}, // Last Constantinople block
{4939394, ID{Hash: 0xd6e2149b, Next: 0}}, // First Petersburg block
{5822692, ID{Hash: 0xd6e2149b, Next: 0}}, // Today Petersburg block
},
},
// Rinkeby test cases
{
params.RinkebyChainConfig,
params.RinkebyGenesisHash,
[]testcase{
{0, ID{Hash: 0x3b8e0691, Next: 1}}, // Unsynced, last Frontier block
{1, ID{Hash: 0x60949295, Next: 2}}, // First and last Homestead block
{2, ID{Hash: 0x8bde40dd, Next: 3}}, // First and last Tangerine block
{3, ID{Hash: 0xcb3a64bb, Next: 1035301}}, // First Spurious block
{1035300, ID{Hash: 0xcb3a64bb, Next: 1035301}}, // Last Spurious block
{1035301, ID{Hash: 0x8d748b57, Next: 3660663}}, // First Byzantium block
{3660662, ID{Hash: 0x8d748b57, Next: 3660663}}, // Last Byzantium block
{3660663, ID{Hash: 0xe49cab14, Next: 4321234}}, // First Constantinople block
{4321233, ID{Hash: 0xe49cab14, Next: 4321234}}, // Last Constantinople block
{4321234, ID{Hash: 0xafec6b27, Next: 0}}, // First Petersburg block
{4586649, ID{Hash: 0xafec6b27, Next: 0}}, // Today Petersburg block
},
},
// Goerli test cases
{
params.GoerliChainConfig,
params.GoerliGenesisHash,
[]testcase{
{0, ID{Hash: 0xa3f5ab08, Next: 0}}, // Unsynced, last Frontier, Homestead, Tangerine, Spurious, Byzantium, Constantinople and first Petersburg block
{795329, ID{Hash: 0xa3f5ab08, Next: 0}}, // Today Petersburg block
},
},
}
```

Here's a suite of tests of the different states a Mainnet node might be in and the different remote fork identifiers it might be required to validate and decide to accept or reject:

```go
tests := []struct {
head uint64
id ID
err error
}{
// Local is mainnet Petersburg, remote announces the same. No future fork is announced.
{7987396, ID{Hash: 0x668db0af, Next: 0}, nil},

// Local is mainnet Petersburg, remote announces the same. Remote also announces a next fork
// at block 0xffffffff, but that is uncertain.
{7987396, ID{Hash: 0x668db0af, Next: math.MaxUint64}, nil},

// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, but it's not yet aware of Petersburg (e.g. non updated node before the fork).
// In this case we don't know if Petersburg passed yet or not.
{7279999, ID{Hash: 0xa00bc324, Next: 0}, nil},

// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, and it's also aware of Petersburg (e.g. updated node before the fork). We
// don't know if Petersburg passed yet (will pass) or not.
{7279999, ID{Hash: 0xa00bc324, Next: 7280000}, nil},

// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, and it's also aware of some random fork (e.g. misconfigured Petersburg). As
// neither forks passed at neither nodes, they may mismatch, but we still connect for now.
{7279999, ID{Hash: 0xa00bc324, Next: math.MaxUint64}, nil},

// Local is mainnet Petersburg, remote announces Byzantium + knowledge about Petersburg. Remote
// is simply out of sync, accept.
{7987396, ID{Hash: 0x668db0af, Next: 7280000}, nil},

// Local is mainnet Petersburg, remote announces Spurious + knowledge about Byzantium. Remote
// is definitely out of sync. It may or may not need the Petersburg update, we don't know yet.
{7987396, ID{Hash: 0x3edd5b10, Next: 4370000}, nil},

// Local is mainnet Byzantium, remote announces Petersburg. Local is out of sync, accept.
{7279999, ID{Hash: 0x668db0af, Next: 0}, nil},

// Local is mainnet Spurious, remote announces Byzantium, but is not aware of Petersburg. Local
// out of sync. Local also knows about a future fork, but that is uncertain yet.
{4369999, ID{Hash: 0xa00bc324, Next: 0}, nil},

// Local is mainnet Petersburg. remote announces Byzantium but is not aware of further forks.
// Remote needs software update.
{7987396, ID{Hash: 0xa00bc324, Next: 0}, ErrRemoteStale},

// Local is mainnet Petersburg, and isn't aware of more forks. Remote announces Petersburg +
// 0xffffffff. Local needs software update, reject.
{7987396, ID{Hash: 0x5cddc0e1, Next: 0}, ErrLocalIncompatibleOrStale},

// Local is mainnet Byzantium, and is aware of Petersburg. Remote announces Petersburg +
// 0xffffffff. Local needs software update, reject.
{7279999, ID{Hash: 0x5cddc0e1, Next: 0}, ErrLocalIncompatibleOrStale},

// Local is mainnet Petersburg, remote is Rinkeby Petersburg.
{7987396, ID{Hash: 0xafec6b27, Next: 0}, ErrLocalIncompatibleOrStale},
}
```

Here's a couple of tests to verify the proper RLP encoding (since `FORK_HASH` is a 4 byte binary but `FORK_NEXT` is an 8 byte quantity):

```go
tests := []struct {
id ID
want []byte
}{
{
ID{Hash: checksumToBytes(0), Next: 0},
common.Hex2Bytes("c6840000000080"),
},
{
ID{Hash: checksumToBytes(0xdeadbeef), Next: 0xBADDCAFE},
common.Hex2Bytes("ca84deadbeef84baddcafe"),
},
{
ID{Hash: checksumToBytes(math.MaxUint32), Next: math.MaxUint64},
common.Hex2Bytes("ce84ffffffff88ffffffffffffffff"),
},
}
```

## Implementation

https://github.com/ethereum/go-ethereum/pull/19738

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).