-
Notifications
You must be signed in to change notification settings - Fork 2.6k
More efficient storage proofs when verifier already knows values #3782
Comments
CC @cheme |
@jimpo , going in such direction, please keep in mind that proof are mainly use for light client where the api takes into consideration the fact that values are included into the proof (reply to on_demand request is only the proof and results are extracted from the proof check). |
Removing hashes that are self contained in the proof can be a size saving, I think it is a cool idea. Edit: maybe I misread your comment, and understand it as do not put in the proof hashes that can be calculated from proof content. Actually the idea to fetch missing data from hashdb meet the same questioning I had in my previous comment. |
@cheme So there are two complaints -- first that I think this challenges the notion of a "proof" which is just used to check some statement typically. The statement being "the value at this key in the state trie with this root is XXX". So I just find the documentation sort of confusing. However, there is a use case I'm running into where this seems to make a material difference. In particular, when you already know the value from a past query and want to make sure that it has not changed. For the substrate bridge project, we want to do this with the Grandpa authorities. Say we have some header and we have the authorities at that header and have checked the proof. Then we see a later block header and the prover claims that the authority set has not changed. Instead of including the authority set redundantly in the proof, they should be able to provide a shorter proof since the verifier already has the authority set. Another case might be checking that the runtime code has not changed, which is a very large value that changes infrequently. |
makes sense, but in case it did change you will need to get the new value. In case it did not it is a gain. This also applies to light client in fact. For light client it might not be a good idea as it will involve another query round trip for something that is likely to be a small value (or the value should not be in the trie). One could argue no big value should be in the state trie (especially runtime or wasm blob), but practically it actually seems to make sense for the bridge case. So, if we do not change the existing proof logic, what you want is a proof for It seems more complicated than simply having a proof recorder variant that do not record the leaf nodes, but somehow seems more correct to me. |
Related: We store the trees radix 16 for the space savings, but once upon a time we used radix 16 for computing the hashes and made our proofs. I suggested we switch to radix 2 for hashing and proofs, which requires more computation for provers, but halves our proof sizes and proof verification time. |
I got hope (but things are a bit stuck behind other issues) that at some point we could have heterogeneous child tries so we can easily switch proof implementation (I once got a branch of the trie crate for radix 2, 4, and 256 but drop it for the sake of the pr review and the lack of testing). |
@burdges I assume this is what you mean, but the problem with a smaller radix is not so much that the proving time goes up as that the update time to write changes to the trie increases during regular node operation. Generally the provers are running full nodes so their computation load does increase, but in a different capacity. |
@jimpo You still want radix 16 storage but with radix 2 hash computation. At least recently we hashed like H(child[0],child[1],...child[15]) which requires 15 extra values, while instead we should hash like
I suppose it'd resemble:
We already have caching infrastructure that avoids recomputing all these hashes with every change until the whole block gets processed, right? We can tune the hash function so that these 16 invocations do not dramatically increase the computation time over the 16 input operations required for the radix 16 hash too. @cheme It's possible your storage radix could/should exceed 16 once you separate it from the hash computation radix, not sure. :) I believe radix 2 hash computation and proofs should be more efficient that radix 16 whenever you verify more than you prove, including all relay chain handling of parachains. We'll need tries with alternative hash functions so that parachains can use SNARK friendly hash functions, which dramatically decrease SNARK circuit complexity, but actually run far slower by wall clock time. We'll want sparse merkle trees eventually too, which helps with non-membership proofs. I'm actually not sure right now how much these differ, but they do reduce computation time using one special non-collision resistant value, i.e. H(x,0) = x and H(0,x) = x but H(x,y) with x,y nonzero is collision resistant. There are other mechanisms for arranging the leaves of merkle trees, as well as other storage proof techniques, like accumulators built with algebra not merkle trees. We might not ever care about these for production, but we'd might get more academics playing around on polkadot if they can make these work. |
We can move this side conversation over to my original issue paritytech/trie#23 if it's a derail here. |
The
substrate-state-machine
crate exposes APIs for generating and verifying storage proofs using the Merkle-Patricia trie. The main function signature for verification is:Notably, the function does not take key-value pairs as input and verifies that they are right. Rather, the proof encodes the values themselves and "checking the proof" extracts and returns them.
Not only is this an usual definition of "proof", but it can unnecessary increase the proof size in contexts where the verifier already has the values that it wants to check (through computation or checking that some value has not changed) and the values are large.
An interface more like the following would enable proof size savings.
Proof structure
Merkle trie proofs currently are the subset of nodes in the trie traversed when reading some value. As such it contains all of the leaf nodes of the values read. The nodes are encoded and the proof is a vector of encoded trie nodes.
Removing values from proofs
The easiest way to reduce the proof size would be to remove the leaf nodes from the proof. Instead, the proof would consist of only the branch and extension nodes traversed when reading the values. In order to verify, the verifier attempts a lookup in the trie and if it encounters a read of a trie node by hash that it does not have in the database, then it attempts to construct a leaf node for the value matching the partial key. This can be implemented with a custom
HashDB
since the signature for getting a value on aHashDB
isThe
get
method takes a partial key (the prefix). The custom HashDB can match this against the key being looked up and compute the partial key for the leaf node and thus the entire leaf node. If the hash of the leaf node matches, then it returns it and the lookup succeeds.Removing hashes from proofs
Further proof size savings can be achieved by not only omitting the leaf nodes but also inlined values and intermediate hashes in branch and extension nodes. However, this is much more invasive and would almost certainly require changes to the
trie-db
crate itself, so I will not explore an implementation plan in this issue.The text was updated successfully, but these errors were encountered: