Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State tree format conversion with the tree overlay method #2584

Merged
merged 16 commits into from
Aug 27, 2020
Merged
108 changes: 108 additions & 0 deletions EIPS/eip-2584.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
eip: 2584
title: Trie format transition with overlay trees
author: Guillaume Ballet (@gballet)
discussions-to: https://ethresear.ch/t/overlay-method-for-hex-bin-tree-conversion/7104
status: Draft
type: Standards Track
category: Core
created: 2020-04-03
---

## Simple Summary

This EIP proposes a method to convert the state trie format from hexary to binary: new values are directly stored in a binary trie “laid over” the hexary trie. Meanwhile, the hexary trie is converted to a binary trie in the background. When the process is finished, both layers are merged.

## Abstract

This EIP describes a four phase process to complete the conversion.

* In the first phase, all new state writes are made to an overlay binary trie, while the hexary trie is being converted to binary. The block format is changed to have two storage roots: the root of the hexary trie (hereafter called the "base" trie) and the root of the overlay binary trie.
* After enough time has been given to miners to perform the conversion, the second phase begins. The overlay tree is progressively merged back into the newly converted binary base trie. A constant number of entries are deleted from the overlay and inserted into the base trie.
* The third and final phase begins when the overlay trie is empty. The field holding its root is removed from the block header.

## Motivation

There is a long running interest in switching the state trie from a hexary format to a binary format, for reasons pertaining to proof and storage sizes. The conversion process poses a catch-up issue, caused by the sheer size of the full state: it can not be translated in a reasonable time (i.e. on the same order of magnitude as the block time).

## Specification

This specification follows the notation introduced by the [Yellow Paper](https://ethereum.github.io/yellowpaper). Prior to reading it is advisable to be familiar with the Yellow Paper.

### Binary tries

This EIP assumes that a binary trie is defined like the MPT, except that:

* The series of bytes in I₀ is seen as a series of _bits_ and so ∀i≤256, I₀[i] is the ith bit in key I₀
* The first item of an **extension** or a **leaf** is replacing nibbles with bits;
* A **branch** is a 2 item structure in which both items correspond to each of the two possible bit values for the keys at this point in their traversal;
* c(𝕴,i) ≡ RLP((u(0), u(1)) at a branch, where u(j) = n({I : I ∈ 𝕴 ⋀ I₀[i] = j}, i+1)

Let ß be the function that, given a hexary trie, computes the equivalent representation of that trie in the aforementioned binary trie format.

### Phase 1

Let _h₁_ be the previously agreed-upon block height at which phase 1 starts, and _h₂_ the block at which phase 2 starts. For each block of height h₁ ≤ _h_ < h₂:

0. A conversion process is started in the background, to turn the hexary trie into its binary equivalent. The end goal of this process is the calculation of the _root hash of the converted binary trie_, denoted Hᵣ². The root of the hexary trie is hereafter called Hᵣ¹⁶. Formally, this process is written as Hᵣ² ≡ ß(Hᵣ¹⁶).
1. Block headers contain a new Hₒ field, which is the _root of the overlay binary trie_;
2. Hᵣ ≡ P(H)ᵣ¹⁶, i.e. as long as the conversion from hexary to binary is not complete, the hexary trie root is the same as that of its parent block.

The following is changed in the execution environment:

* Upon executing a _state read_, ϒ first searches for the address in the overlay trie. If the key can not be found there, ϒ then searches the base trie as it did at block heights h' < h₁;
* Upon executing a _state write_, ϒ will insert or update the value into the overlay tree. The base tree is left untouched.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens on state deletions, when an account is deleted from the state trie. Do we insert some "temporary" null-value into the trie or do we need to maintain an explicit "deletion" thing?


Phase 1 ends at block height h₂, which is set far enough from h₁ to offer miners enough time to perform the conversion.

### Phase 2

The following changes occur in phase 2:

* Before the execution of ϒ, Hᵣ ≡ Hᵣ², i.e. the value before the execution of the transition function is set to the root of the converted _binary base trie_.
* N accounts are being deleted from the binary overlay trie and inserted into the binary base trie.
* Upon executing a _state write_, ϒ will insert or update the value into the _base_ trie. If the search key exists in the overlay trie, it is deleted.

When the overlay trie is empty, phase 2 ends and phase 3 begins.

### Phase 3

Phase 3 is the same as phase 2, except for the following change:

* Hₒ is dropped from the block header

## Rationale

Methods that have been discussed until now include a "stop the world" approach, in which the chain is stopped for the significant amount of time that is required by the conversion, and a "copy on write" approach, in which branches are converted upon being accessed.
The approach suggested here has the advantage that the chain continues to operate normally during the conversion process, and that the tree is fully converted to a binary format, in a predictable time.

## Backwards Compatibility

This requires a fork and will break backwards compatibility, as the hashes and block formats will necessarily be different. This will cause a fork in clients that don't implement the overlay tree, and those that do not accept the new binary root. No mitigation is proposed, as this is a hard fork.

## Test Cases

* For testing phase 1, it suffices to check that every key in the hexary trie is also available in the binary trie. A looser but faster test picks 1% of keys in the hexary trie at random, and checks that they are present in the binary trie;
* TBD for phase 2 & 3

## Implementation
<!-- The implementations must be completed before any EIP is given status "Final", but it need not be completed before the EIP is accepted. While there is merit to the approach of reaching consensus on the specification and rationale before writing code, the principle of "rough consensus and running code" is still useful when it comes to resolving many discussions of API details.-->

A prototype version of the conversion process (phase 1) is available for `geth` in [this PR](https://github.com/holiman/go-ethereum/pull/12).

## Security Considerations
<!-- All EIPs must contain a section that discusses the security implications/considerations relevant to the proposed change. Include information that might be important for security discussions, surfaces risks and can be used throughout the life cycle of the proposal. E.g. include security-relevant design decisions, concerns, important discussions, implementation-specific guidance and pitfalls, an outline of threats and risks and how they are being addressed. EIP submissions missing the "Security Considerations" section will be rejected. An EIP cannot proceed to status "Final" without a Security Considerations discussion deemed sufficient by the reviewers. -->

There are three attack vectors that I can foresee:

* A targeted attack that would cause the overlay trie to be unreasonably large. Since gas costs will likely increase during the transition process, lengthening phase 2 will make Ethereum more expensive during an extended period of time. This could be solved by increasing the cost of `SSTORE` during phase 1.
* Conversely, if h₂ comes too soon, a majority of miners might not be able to produce the correct value for Hᵣ² in time.
* If a group of miners representing more than 51% of the network are reporting an invalid value, they could be stealing funds without anyone having a say.

## Community feedback

* Preliminary tests indicate that a fast machine can perform the conversion in roughly 30 minutes.
* The initial version of this EIP expected miners to vote on the value of the binary base root. This has been removed because of the complexity of this process, and because this functionality is already guaranteed by the "longuest chain" rule.
gballet marked this conversation as resolved.
Show resolved Hide resolved

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).