Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for eip-1186 proofs #146

Merged
merged 10 commits into from
Mar 2, 2022
Merged

Conversation

lightyear15
Copy link
Contributor

EIP-1186[1] is the standard for generating proofs used by EVM-based clients.

The current proof crate in trie-db generates (and verifies) compact proofs that are not compatible with eip-1186.
This commit adds functions to generate and verify such proofs, adding also proper testing of such new APIs

[1] https://eips.ethereum.org/EIPS/eip-1186

EIP-1186[1] is the standard for generating proofs used by EVM-based clients.

The current proof crate in trie-db generates (and verifies) compact proofs that are not compatible with eip-1186.
This commit adds functions to generate and verify such proofs, adding also proper testing of such new APIs

[1] https://eips.ethereum.org/EIPS/eip-1186
Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. Nice to see it.
Using its own module, it is not obtrusive and could probably be in his own crate (may need to expose some internal structs or methods).

Next step could be to implement it to run over multiple keys.

But I am just not sure if it should be include in the code base.
On one hand I am not sure about the state of this eip (@arkpar or @sorpaas maybe you did know).

On the other hand I am wondering if the additional code is really required: in parity-ethereum/open-ethereum we just did load all encoded nodes in a memory-db (which does the hash calculation and is queried by this hash) and run the key query on a TrieDB that uses this partial MemoryDB.
For me, the additional value of this PR is that it enforces nodes to be ordered by key on verification and it provides more error information (but I do not know if eip requires it).
Also it is more efficient than using a MemoryDB because we do not need to move/copy the encoded nodes to the 'memory-db' map.

If those point are really needed (ordering check, errors and performance), modifying https://github.com/paritytech/trie/blob/master/trie-db/src/proof/verify.rs to keep the useless hashes may also be a good way to factor code a bit and already have multiple key case managed. But as I mentioned I am not sure this is needed (using a TrieDB over a partial MemoryDB build from proof nodes is really what I prefer as it is more flexible).

}
},
(Some(Value::Inline(inline_data)), _, None) => Err(VerifyError::ExistingValue(inline_data.to_vec())),
(Some(Value::Node(plain_hash, _)), Some(next_proof_item), Some(value)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition to eip1196 (could also just return an error since eth do not have this kind of nodes).

Copy link
Contributor Author

@lightyear15 lightyear15 Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment referring line 287 or 286? I think case 287 is the one actually used, your comment probably refers to line 286, isn't it?
If we leave the case as it is, even if eth does not use such node type, is it going to cause any issue in proof verification?
Can we leave it as it is? and make this code even more general? (not strictly linked to ethereum, but also, maybe, embracing other ethereum-like networks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just refering to handling the Value::Node case.
Value::Node is never use with ethereum trie layout, values are always inlined in either a branch or a leaf (also there is no NibbledBranch with ethereum codec).
But the code you write looks good to me (adding the value node as a following node is what I did for the other proofs), so I'd rather keep it (it allows running the tests on all layout which looks nice).

}
Some(Some(NodeHandle::Inline(encoded_node))) => {
key.advance(1);
let root = L::Hash::hash(encoded_node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hash calculation is not needed.

Copy link
Contributor Author

@lightyear15 lightyear15 Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry,
I am not sure I understand your comment. root is passed to the process_node function the line after.
How can I avoid this hashing operation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIReadC, in process_node this calculated hash is only compared again with encoded hash.
Making the input in process_node optional (or with an enum { Inline, Node(H) }) would work to avoid this check for inline node.

}

fn process_node<'a, L>(
root: &<L::Hash as Hasher>::Out,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it expected_node_hash or something that relates to a node (root makes me think it is the root node hash).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will also change calculate_root to calculated_node_hash then.

(root, proof.0, proof.1)
}

test_layouts!(trie_proof_works2, trie_proof_works_internal2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for running on all layout.

@sorpaas
Copy link
Member

sorpaas commented Jan 12, 2022

On one hand I am not sure about the state of this eip

The status of that EIP is in draft. But this doesn't matter much for our case. I don't see that blocking us merging this PR.

On the other hand, I'd prefer to have this code living in a separate crate. It should be easily doable as the code doesn't touch any other part of trie. I don't have any objections to include this code in this repo -- it looks short and easily maintainable.

@lightyear15
Copy link
Contributor Author

Thanks @cheme,
for the review.
To your doubts:

Code looks good to me. Nice to see it. Using its own module, it is not obtrusive and could probably be in his own crate (may need to expose some internal structs or methods).

Next step could be to implement it to run over multiple keys.
I don't think it would make much sense as users with multiple storage keys to be proved can always do something like

storage_proofs.iter().map(|storage_proof| verify_proof(root, storage_proof.proof, storage_proof.key, storage_proof.value).collect::<Result<Vec<()>,_>>()?;

EIP-1186 might be used for different purposes: to prove multiple storage keys at once (they must belong to the same contract though) or just to prove an account balance, that is why I left the verify_proof function signature as loose as possible so users can create wrappers around it for their own use cases.
Let me know if I misinterpreted your idea.

But I am just not sure if it should be include in the code base. On one hand I am not sure about the state of this eip (@arkpar or @sorpaas maybe you did know).

Agreed, I don't it either, it looks stalled and not offcially approved yet, however every eth client seems to implement it

On the other hand I am wondering if the additional code is really required: in parity-ethereum/open-ethereum we just did load all encoded nodes in a memory-db (which does the hash calculation and is queried by this hash) and run the key query on a TrieDB that uses this partial MemoryDB. For me, the additional value of this PR is that it enforces nodes to be ordered by key on verification and it provides more error information (but I do not know if eip requires it). Also it is more efficient than using a MemoryDB because we do not need to move/copy the encoded nodes to the 'memory-db' map.

The real purpose of this PR is to implement a verifier function for EIP-1186, as the generate/verify in tried_db::proof aren't of any use in an hypothetical ethereum-like light-client.

If those point are really needed (ordering check, errors and performance), modifying https://github.com/paritytech/trie/blob/master/trie-db/src/proof/verify.rs to keep the useless hashes may also be a good way to factor code a bit and already have multiple key case managed. But as I mentioned I am not sure this is needed (using a TrieDB over a partial MemoryDB build from proof nodes is really what I prefer as it is more flexible).

EIP-1186 as described in https://eips.ethereum.org/EIPS/eip-1186 states:

accountProof: ARRAY - Array of rlp-serialized MerkleTree-Nodes, starting with the stateRoot-Node, following the path of the SHA3 (address) as key.

and also

proof: ARRAY - Array of rlp-serialized MerkleTree-Nodes, starting with the storageHash-Node, following the path of the SHA3 (key) as path.

So I assume it is actually a requirement that the nodes come ordered.
This, as you already noticed, allows for a huge memory optimization as the function can recursively check the proof withou requiring any extra memory for memory-DB nor trie-DB

@cheme
Copy link
Contributor

cheme commented Jan 12, 2022

I don't think it would make much sense as users with multiple storage keys to be proved can always do something like

it is to reduce number of encoded node in the proof, when you prove multiple key/value for an account, then you remove duplicate nodes by only having them once in the key query order (you are iterating on n keys rather than hitting them from root n time).

Looking again at the eip, I think you're right.

Multiple keys is not really part of the eip (I mean without redundancy).
The json sample seems to use an array on the storage proof indicating multiple keys are possible, but it uses its own proof field for each key so they probably don't share nodes.

To be interesting, eip would need to allow a single sequence of encoded node for multiple account and for each account a single sequence of encoded node for multiple keys in the account.

So I assume it is actually a requirement that the nodes come ordered.

yes, what I wonder if not verifying this ordering (when using trie-db over partial memory-db) is a big issue, I mean the state would be checked properly just it would allow to use different proofs.

@lightyear15
Copy link
Contributor Author

Hello @cheme ,
is there any more action required from me to have this PR merged?

@arkpar
Copy link
Member

arkpar commented Jan 18, 2022

I'm fine with merging it here. I think this crate should be forked for use in substrate eventually. The trie formats between ethereum and substrate have diverged significantly and will probably diverge further. Maintaining a single codebase requires too many abstractions already.

@cheme
Copy link
Contributor

cheme commented Jan 18, 2022

About current PR code there is a few point that should be done:

  • remove warnings (at least no warnings when running cargo test, it is just a few unused import and a phantomdata to use for layout).
  • run cargo +nightly fmt --all (need cargo format and nightly toolchain, I can also run it).
  • not hashing inline node would be more correct (we never should hash an inline node).

About:

modifying https://github.com/paritytech/trie/blob/master/trie-db/src/proof/verify.rs to keep the useless hashes may also be a good way to factor code a bit and already have multiple key case managed.

I just checked this, it is totally doable (same ordering of nodes), basically create an optional mode where instead of filling nodes with omitted elements (value or children hash), the elements are checked (hash of children and values).
Could be a way to reduce the code, but I also realize it is not very straightforward, so probably not a good idea from my part.

@lightyear15
Copy link
Contributor Author

lightyear15 commented Jan 18, 2022

@arkpar @cheme ,
thanks a lot for your feedbacks.
@cheme to your points:

  • I don't see any warnings where running cargo test, let me know if I am overlooking something please.
  • I hope I will get cargo fmt ok at some point, easier said than done
  • I changed the process_node function to accept an optional root hash, so whenever the hash is not available I can pass a a None rather than doing pointless hash computation.

I like squashing away minor commits before merging... let me know if you wish me to do it ✌️

@cheme
Copy link
Contributor

cheme commented Jan 18, 2022

I don't see any warnings where running cargo test, let me know if I am overlooking something please.

second round fixes the unused imports, there is still one, but unrelated to your PR (probably new rustc compiler finding new warning).

I forgot this, can you include:

diff --git a/trie-db/CHANGELOG.md b/trie-db/CHANGELOG.md
index 98aa56e..739ba7f 100644
--- a/trie-db/CHANGELOG.md
+++ b/trie-db/CHANGELOG.md
@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog].
 [Keep a Changelog]: http://keepachangelog.com/en/1.0.0/
 
 ## [Unreleased]
+- Support eip 1186 trie proofs. [#146](https://github.com/paritytech/trie/pull/146)
 
 ## [0.23.0] - 2021-10-19
 - Support for value nodes. [#142](https://github.com/paritytech/trie/pull/142)

I like squashing away minor commits before merging...

Don't worry, I will use github squash and merge, I will probably just write Support eip 1186 trie proofs.

Copy link
Member

@sorpaas sorpaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the code to a separate crate (maybe call it trie-eip1186) or is there a reason that you insist on the current code structure? Many current users will not need the 1186 feature and it saves compile time for them, while at the same time has no downside for 1186 users.

@lightyear15
Copy link
Contributor Author

Can you move the code to a separate crate (maybe call it trie-eip1186) or is there a reason that you insist on the current code structure? Many current users will not need the 1186 feature and it saves compile time for them, while at the same time has no downside for 1186 users.

No particular reason, I thought it could have been a good follow-up PR, but if you think it's fundamental I can do it in this one.

trie-db/CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: cheme <emericchevalier.pro@gmail.com>
trie-eip1186/src/lib.rs Outdated Show resolved Hide resolved
Co-authored-by: Wei Tang <accounts@that.world>
@lightyear15
Copy link
Contributor Author

Is this PR waiting for something to happen on my side?

@arkpar
Copy link
Member

arkpar commented Mar 2, 2022

@cheme Could you please take another look

@cheme
Copy link
Contributor

cheme commented Mar 2, 2022

Code looks good. I like the fact that it is now using its own crate (could even be an external repo).
I remember that I thought let's merge it before next release, but I know I will forget when the time come.
So merging it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants