[Stateless validation] per-receipt hard limit on recorded trie storage proof size #11019

walnut-the-cat · 2024-04-10T21:27:54Z

When a receipt is executed it performs trie storage operations. Every such operation touches some nodes from the pre-state trie, which have to be recorded and added to the state witness storage proof.
The size of recorded storage proof could potentially get very large - in #9378 (comment) it was said that a single receipt could potentially generate hundreds of MBs of state!
A ChunkStateWitness this large would cause problems, so we have to add a limit to make sure that its size doesn't explode.
We should introduce a limit on generated storage proof size, similar to the gas limit. While executing the receipt we would check if the size of recorded storage proof got too large and fail the receipt if it did.

This might be a breaking change, as it might break some existing contracts which generate a large storage proof, but it's necessary - size of ChunkStateWitness has to be limited, so we need to limit the size of storage proof. It's a limitation of stateless validation. We should cautiously analyze how much storage proof is generated by receipts which are currently present on mainnet and choose a limit that won't break everything.

The text was updated successfully, but these errors were encountered:

…tion (#11069) During receipt execution we record all touched nodes from the pre-state trie. Those recorded nodes form the storage proof that is sent to validators, and validators use it to execute the receipts and validate the results. In #9378 it's stated that in a worst case scenario a single receipt can generate hundreds of megabytes of storage proof. That would cause problems, as it'd cause the `ChunkStateWitness` to also be hundreds of megabytes in size, and there would be problems with sending this much data over the network. Because of that we need to limit the size of the storage proof. We plan to have two limits: * per-chunk soft limit - once a chunk has more than X MB of storage proof we stop processing new receipts, and move the remaining ones to the delayed receipt queue. This has been implemented in #10703 * per-receipt hard limit - once a receipt generates more than X MB of storage proof we fail the receipt, similarly to what happens when a receipt goes over the allowed gas limit. This one is implemented in this PR. Most of the hard-limit code is straightforward - we need to track the size of recorded storage and fail the receipt if it goes over the limit. But there is one ugly problem: #10890. Because of the way current `TrieUpdate` works we don't record all of the storage proof in real time. There are some corner cases (deleting one of two children of a branch) in which some nodes are not recorded until we do `finalize()` at the end of the chunk. This means that we can't really use `Trie::recorded_storage_size()` to limit the size, as it isn't fully accurate. If we do that, a malicious actor could prepare receipts which seem to have only 1MB of storage proof during execution, but actually record 10MB during `finalize()`. There is a long discussion in #10890 along with some possible solution ideas, please read that if you need more context. This PR implements Idea 1 from #10890. Instead of using `Trie::recorded_storage_size()` we'll use `Trie::recorded_storage_size_upper_bound()`, which estimates the upper bound of recorded storage size by assuming that every trie removal records additional 2000 bytes: ```rust /// Size of the recorded state proof plus some additional size added to cover removals. /// An upper-bound estimation of the true recorded size after finalization. /// See #10890 and #11000 for details. pub fn recorded_storage_size_upper_bound(&self) -> usize { // Charge 2000 bytes for every removal let removals_size = self.removal_counter.saturating_mul(2000); self.recorded_storage_size().saturating_add(removals_size) } ``` As long as the upper bound is below the limit we can be sure that the real recorded size is also below the limit. It's a rough estimation, which often exaggerates the actual recorded size (even by 20+ times), but it could be a good-enough/MVP solution for now. Doing it in a better way would require a lot of refactoring in the Trie code. We're now [moving fast](https://near.zulipchat.com/#narrow/stream/407237-core.2Fstateless-validation/topic/Faster.20decision.20making), so I decided to go with this solution for now. The upper bound calculation has been added in a previous PR along with the metrics to see if using such a rough estimation is viable: #11000 I set up a mainnet node with shadow validation to gather some data about the size distribution with mainnet traffic: [Metrics link](https://nearinc.grafana.net/d/edbl9ztm5h1q8b/stateless-validation?orgId=1&var-chain_id=mainnet&var-shard_id=All&var-node_id=ci-b20a9aef-mainnet-rpc-europe-west4-01-84346caf&from=1713225600000&to=1713272400000) ![image](https://github.com/near/nearcore/assets/149345204/dc3daa88-5f59-4ae5-aa9e-ab2802f034b8) ![image](https://github.com/near/nearcore/assets/149345204/90602443-7a0f-4503-9bce-8fbce352c0ba) The metrics show that: * For all receipts both the recorded size and the upper bound estimate are below 2MB * Overwhelming majority of receipts generate < 50KB of storage proof * For all chunks the upper bound estimate is below 6MB * For 99% of chunks the upper bound estimate is below 3MB Based on this I believe that we can: * Set the hard per-receipt limit to 4MB. All receipts were below 2MB, but it's good to have a bit of a safety margin here. This is a hard limit, so it might break existing contracts if they turn out to generate more storage proof than the limit. * Set the soft per-chunk limit to 3MB. 99% of chunks will not be affected by this limit. For the 1% that hit the limit they'll execute fewer receipts, with the rest of the receipts put into the delayed receipt queue. This slightly lowers throughput of a single chunk, but it's not a big slowdown, by ~1%. Having a 4MB per-receipt hard limit and a 3MB per-chunk soft limit would give us a hard guarantee that for all chunks the total storage proof size is below 7MB. It's worth noting that gas usage already limits the storage proof size quite effectively. For 98% of chunks the storage proof size is already below 2MB, so the limit isn't really needed for typical mainnet traffic. The limit matters mostly for stopping malicious actors that'd try to DoS the network by generating large storage proofs. Fixes: #11019

walnut-the-cat added the A-stateless-validation Area: stateless validation label Apr 10, 2024

walnut-the-cat assigned jancionear Apr 10, 2024

This was referenced Apr 10, 2024

[Tracking issue] State Witness size limit #10259

Open

[ProjectTracking]: Stateless validation Mainnet Release near/near-one-project-tracking#46

Open

jancionear changed the title ~~[Stateless validation] State witness hard limit~~ [Stateless validation] per-receipt hard limit on recorded trie storage proof size Apr 15, 2024

jancionear mentioned this issue Apr 15, 2024

Per-receipt hard limit on storage proof size using upper bound estimation #11069

Merged

jancionear closed this as completed in #11069 Apr 19, 2024

github-actions bot mentioned this issue May 1, 2024

Monthly issue metrics report #11194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stateless validation] per-receipt hard limit on recorded trie storage proof size #11019

[Stateless validation] per-receipt hard limit on recorded trie storage proof size #11019

walnut-the-cat commented Apr 10, 2024 •

edited by jancionear

Loading

[Stateless validation] per-receipt hard limit on recorded trie storage proof size #11019

[Stateless validation] per-receipt hard limit on recorded trie storage proof size #11019

Comments

walnut-the-cat commented Apr 10, 2024 • edited by jancionear Loading

walnut-the-cat commented Apr 10, 2024 •

edited by jancionear

Loading