Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State tree format conversion with the tree overlay method #2584

Merged
merged 16 commits into from
Aug 27, 2020

Conversation

gballet
Copy link
Member

@gballet gballet commented Apr 3, 2020

This EIP describes the tree overlay method to change the state trie format from hexary to binary, as presented in https://ethresear.ch/t/overlay-method-for-hex-bin-tree-conversion/7104

EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved
EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved
Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>

## Simple Summary

This EIP proposes a method to convert the state trie format from hexary to binary: new values are directly stored in a binary trie “laid over” the hexary trie. Meanwhile, the hexary trie is converted to a binary trie in the background. When the process is finished, both layers are merged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: does this EIP only concern the account trie, and not storage tries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is voluntarily left vague at the moment, because I don't know at this time if both trees are going to be merged (it has been discussed at the 1x workshop and there's an EIP by Vitalik that I haven't found yet).

I currently assume that it will be both the account and storage tries, but the exact shape of things depend on how and when other teams work this out.

EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved
EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved

This EIP describes a four phase process to complete the conversion.

* In the first phase, all new state writes are made to an overlay binary trie, while the hexary trie is being converted to binary. The block format is changed to have two storage roots: the root of the nhexary trie and the root of the binary trie.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really any particular point in having the state writes go to an overlay binary trie? Would it make any difference if we made writes to e.g. an overlay hexary trie?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the overlay is hexary, then it would have to be converted during the phase 3 merge. Having an overlay binary trie allows the clients to directly switch to binary tries. Then it gives clients time to perform the conversion in the background.

EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved

Phase 2 is the same as phase 1, except for the following:

* Hᵣ' ≡ Hᵣ²
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please flesh this out a bit. binary root hash equals hexary root hash? I understand what you mean, but technically, the roots won't be the same, so it looks odd to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that the new Header root is replaced with the binary root. I'm trying to follow the yp notation, in spite of unicode missing some of the symbols. Gotta rethink this then 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to follow the yp notation

Ah that clarifies my (not yet asked) question. Can you perhaps make this clear in the beginning of the specification section?

EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved

The following changes occur in phase 3:

* N accounts are being deleted from the binary overlay trie and inserted into the binary base trie.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will N be specified later? Or would it happen implicitly via the next bullet-item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N still has to be specified. The way I see it, there are two approaches, and I'd like to get some feedback on that:

Approach 1: N is set to roughly 700. This is because a quick estimate based on my tests say it represents about 1 second. This number would be bigger if we use extensions in binary trees, which I will try when I complete the code that does this.

Approach 2: Dynamically calculate N based on the difficulty, to ensure it takes around 1 second.

Any thoughts?

gballet and others added 3 commits April 4, 2020 08:42
Co-Authored-By: Martin Holst Swende <martin@swende.se>
Co-Authored-By: Martin Holst Swende <martin@swende.se>
EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved
The following is changed in the execution environment:

* Upon executing a _state read_, ϒ first searches for the address in the overlay trie. If the key can not be found there, ϒ then searches the hexary base trie as it did at block heights h' < H₁;
* Upon executing a _state write_, ϒ will insert or update the value into the overlay tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also clarify that:

  1. Before every write a read also happens (needed to determine costs).
  2. Upon writing into the overlay, it is deleted from the base trie?

I'm not sure if 2) is proposed here or whether it is a good idea or not, but clarity would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During phase 1, values are not deleted from the base trie, as the base trie is read-only. I'll make it clear 👍

EIPS/eip-overlay_tree.md Outdated Show resolved Hide resolved
gballet and others added 3 commits April 4, 2020 19:25
Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>
Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>
@gballet gballet force-pushed the add-overlay-tree-eip branch from e291af3 to 0414dc6 Compare April 5, 2020 07:51
EIPS/eip-2584.md Outdated
The following is changed in the execution environment:

* Upon executing a _state read_, ϒ first searches for the address in the overlay trie. If the key can not be found there, ϒ then searches the base trie as it did at block heights h' < h₁;
* Upon executing a _state write_, ϒ will insert or update the value into the overlay tree. The base tree is left untouched.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens on state deletions, when an account is deleted from the state trie. Do we insert some "temporary" null-value into the trie or do we need to maintain an explicit "deletion" thing?

@holiman
Copy link
Contributor

holiman commented Apr 29, 2020

Alternative variant:
During phase 1 , we don't stop the world, but we stop transactions. The side-effect would be, that the only things that goes into the overlay trie is the changes to the miner coinbases. Thus, the overlay trie will be very small (realistically a handful, theoretically the same as the number of blocks in phase 1).
The result will be that the merging of overlay trie into base trie can happen in one go, and we can skip the move-700-accounts phase altogether.

I think phase 2 is complex, as we need to check conversion-pointer, read from overlay trie, and then read from main trie if the account was deleted from the bintrie due to an earlier write.

This is not 'stop the world', just means that we won't process transactions for (a couple of days? one day? a few hours?), but we still pay miners for work performed.

@holiman
Copy link
Contributor

holiman commented Apr 30, 2020

One more upside with no-tx-in-phase-1 is that we won't have to worry about deletions.
So here's the problem with deletions:

  • The hexary trie is 'frozen', so if an account is deleted, we somehow have to place a deleted-marker into the "ephemeral" overlay binary trie.
  • However, if an account is created during this phase, and then deleted, it must not get that same deleted-marker.
  • There's no precedent about a deleted-marker in tries, and there's no definitions on how to hash it.

So you might run into the solution that we'll have to maintain an additional "deletion-trie", or "deletion-list", which also must be in the blockhash. IT becomes a pretty hairy problem to solve.

So if the only effect after a block is the addition of mining rewards to the miner and any ommer-miners, then that problem goes away.

TLDR; - there are two reasons for not allowing txs during phase 1:

  • Phase 2 can be done instantly, in one block
  • No deletions need to be managed,

@gballet
Copy link
Member Author

gballet commented Apr 30, 2020

There's no precedent about a deleted-marker in tries, and there's no definitions on how to hash it.

You can use the empty root hash as a deleted marker. You walk the down the trie, and instead of encountering an account, you encounter the empty hash.

So you might run into the solution that we'll have to maintain an additional "deletion-trie", or "deletion-list", which also must be in the blockhash. IT becomes a pretty hairy problem to solve.

I don't believe that a deletion tree is needed, since the deletion is already recorded inside the trie.

(...) it must not get that same deleted-marker.

Could you elaborate on why a different marker is necessary? If you create an account in phase 1, and then delete it in that same phase, the overlay tree would end up with a deleted marker and the behavior is exactly the same: when encountering an empty hash in the overlay trie, don't search the base trie.

I get the rationale for a quasi-instant phase 2. I would still prefer an approach in which you increase the cost of CREATE and SSTORE during phase 1 - you can still execute transactions if you really need to, and you don't have to disable writing.

@holiman
Copy link
Contributor

holiman commented Apr 30, 2020

Could you elaborate on why a different marker is necessary?

No, actually, you're probably right, those could be treated the same way. But what we wind up with is a third trie variant, which functions slightly differently than the other tries.

This third trie variant is IMO kind of hacky: there is no unambiguous representation of an empty trie. It gets a sort of 'memory'. I would probably prefer some extra data structure instead of "abusing" the trie this way, if we were to go with your approach.

EIPS/eip-2584.md Outdated Show resolved Hide resolved
@axic
Copy link
Member

axic commented May 15, 2020

@gballet when should this be merged as a draft?

Co-authored-by: Alex Beregszaszi <alex@rtfs.hu>
@gballet
Copy link
Member Author

gballet commented Jun 22, 2020

@axic it could be merged now.

@axic axic merged commit 1e5db59 into ethereum:master Aug 27, 2020
tkstanczak pushed a commit to tkstanczak/EIPs that referenced this pull request Nov 7, 2020
)

* Tree format conversion by the tree overlay method

* Correct spelling error

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Martin Holst Swende <martin@swende.se>

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Martin Holst Swende <martin@swende.se>

* Review feedback from axic

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>

* Integrate review feedback

* merge phase 1 and 2 into a single phase

* No more voting in the phase 1 -> phase 2 transition

* specify phase 1 tests

* Apply some of holiman's feedback

* Update EIPS/eip-2584.md

Co-authored-by: Alex Beregszaszi <alex@rtfs.hu>

* Remove template comments

Co-authored-by: Alex Beregszaszi <alex@rtfs.hu>
Co-authored-by: Martin Holst Swende <martin@swende.se>
Arachnid pushed a commit to Arachnid/EIPs that referenced this pull request Mar 6, 2021
)

* Tree format conversion by the tree overlay method

* Correct spelling error

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Martin Holst Swende <martin@swende.se>

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Martin Holst Swende <martin@swende.se>

* Review feedback from axic

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>

* Update EIPS/eip-overlay_tree.md

Co-Authored-By: Alex Beregszaszi <alex@rtfs.hu>

* Integrate review feedback

* merge phase 1 and 2 into a single phase

* No more voting in the phase 1 -> phase 2 transition

* specify phase 1 tests

* Apply some of holiman's feedback

* Update EIPS/eip-2584.md

Co-authored-by: Alex Beregszaszi <alex@rtfs.hu>

* Remove template comments

Co-authored-by: Alex Beregszaszi <alex@rtfs.hu>
Co-authored-by: Martin Holst Swende <martin@swende.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants