Restructure TrieStorage and implement correct semantics for state proofs #9350

robin-near · 2023-07-26T05:07:39Z

Background

For stateless validation, it must be possible to generate a state proof when applying a chunk, and we must ensure that

During generation of the state proof, the execution outcome is identical to if we weren't generating a state proof
When reapplying the same chunk using only the state proof (no full trie or flat storage), the outcome must also be identical.

Problem

The tricky part is storage gas accounting.

For background, the trie is stored as two kinds of entries: trie nodes, and values. For trie nodes, the key is the hash of the node and the value is the node. For values, the key is the hash of the whole value, and the value is the value.

When doing a lookup of either a trie node or a value, there can be three cases:

If looking up a trie node, the node is returned from flat storage (unless flat storage is disabled for testing)
Otherwise, the node or value is looked up first from the chunk cache (deterministic cache that is accumulated during chunk application). If present, it is considered an in-memory access.
Otherwise, the node or value is fetched from disk, and that's considered an from-db access. (The actual fetch however may come from another cache called the 'shard cache', which is non-deterministic and best-effort.)

Each of these cases incurs a different storage operation cost. Therefore, when recording and replaying, we must also compute the exact same cost, by correctly determining which case happens for each access.

Previously, this was difficult, because the chunk cache lives in TrieCachingStorage, while the flat storage layer lives in Trie. (As a background, Trie contains TrieStorage, which is a base class that is implemented by TrieCachingStorage, TrieRecordingStorage, and TriePartialMemoryStorage). So when recording or replaying, the TrieStorage used is NOT a TrieCachingStorage, but either TrieRecordingStorage or TrieMemoryStorage. This makes the chunk cache inaccessible, and so the other two storage implementations had to replicate the behavior; however this behavior was impossible to replicate because the flat storage layer is independent:

Consider when a lookup happens through flat storage. The Trie looks it up from flat storage and then skips calling the underlying TrieStorage. This is undesirable if we're recording, because even though the result came from flat storage, we must still include it in the recorded state proof. That means we should call the underlying TrieStorage to do a lookup, but then that would increment in-memory and db access counters which would be incorrect.

What this PR does

This PR moves the critical chunk cache logic out of TrieCachingStorage and into a field of Trie, which ensures that no matter what TrieStorage is being used, the chunk cache logic is always the same. Additionally, TrieRecordingStorage is removed and the logic is also moved into Trie as an optional field. This allows Trie to have total control of when to cache and when to record:

In the flat storage case, if we're recording, we would additionally look up via the trie (so we can produce the state proof), but this lookup goes straight to the TrieStorage layer, which means we're not counting the cost of such trie accesses. This ensures that the gas computation is the same whether we're recording or not.
If we're recording, any access that went to the TrieStorage layer also goes into the recorded state proof. This is done in a helper method so we ensure we don't miss anything.
If we're replaying, instead of using TrieCachingStorage, we use TriePartialMemoryStorage. The chunk cache layer remains the same since that's independent. There's one difference, which is that we don't have flat storage when replaying. The trick here is to do trie lookups without counting the cost, just like when recording.

This is still not a perfect design as there's still quite a few places where we penetrate the abstraction layers, but it's better than before. At least we don't need to have a TrieRecordingStorage having a TrieCachingStorage as the backend and having to reimplement chunk caching behavior everywhere.

Testing

Eight randomized tests have been added (see trie_recording.rs) to check that in various scenarios (chunk cache enabled or disabled; looking up existing or missing keys; using flat storage or not), the trie access results (values and counters) are exactly the same between normal trie, recording trie, and replaying trie. This should give us confidence that this implementation is suitable for stateless validation.

Also, I'm running a mainnet node with this code. For a few hundred blocks it's been surviving fine, so the implementation should be consistent with the previous, too.

Nayduck: https://nayduck.near.org/#/run/3150

…ofs.

Longarithm

Looks good, that's a great step forward to improving trie structure and #8741.
Leaving comments, mostly about naming stuff.

Longarithm · 2023-07-27T14:20:58Z

core/store/src/trie/chunk_cache.rs

+/// Note that in general, it is NOT true that all storage access is either a
+/// db read or mem read. It can also be a flat storage read, which is not
+/// tracked via TrieChunkCache.
+pub struct TrieChunkCache {


As discussed offline - please consider TrieAccountingCache because it handles both deterministic cache and computing access metrics.

Longarithm · 2023-07-27T14:22:01Z

core/store/src/trie/chunk_cache.rs

+        self.enable = enabled;
+    }
+
+    pub fn retrieve_trie_node_with_cache(


retrieve_trie_node_with_storage? Maybe worth a little comment.

Ah I see the confusion. I'll just simplify the name and add a comment.

Longarithm · 2023-07-27T14:23:40Z

core/store/src/trie/mod.rs

+    /// toggled on the fly), trie nodes that have been looked up once will be
+    /// guaranteed to be cached, and further reads to these nodes will
+    /// encounter less gas cost.
+    chunk_cache: RefCell<TrieChunkCache>,


As you work on this, it would be really great if you also could do additional renaming and help with #9054. We are considering shard/chunk cache as a poor naming.

Perhaps let's focus on the accounting cache name for now since I'm not touching the shard cache yet. And then I can follow up with another rename.

Longarithm · 2023-07-27T14:25:40Z

core/store/src/trie/mod.rs

@@ -321,9 +326,31 @@ impl std::fmt::Debug for TrieNode {
 }

 pub struct Trie {


unrelated: do you think we can easily achieve #6308 (comment)?

I doubt it's easy - the problem is that reading from the trie also requires mutating the cache. And we provide the vm runtime with the ability to read from the trie, which obviously cannot carry a &mut in there or else we'd have to propagate a lifetime parameter all the way in which is going to be completely infeasible, so if the vm runtime is going to start with a & but somehow end up calling a &mut then somewhere in the middle we need a RefCell no matter what. And so putting the RefCell on the cache (which is a common thing to do) seems more reasonable.

Longarithm · 2023-07-27T16:23:34Z

core/store/src/trie/mod.rs

+    fn internal_retrieve_trie_node(
+        &self,
+        hash: &CryptoHash,
+        count_cost: bool,


I suggest renaming it with cache_node[|s] here and all the way, up to renaming force_use_flat_storage_accounting with skip_cache_nodes. I think it explains the behaviour better and doesn't require understanding of flat storage from the reader.

The term "cache" is overloaded - but how about use_accounting_cache, and skip_accounting_cache_for_trie_nodes?

Longarithm

🚀

…ofs (#9350) ## Background For stateless validation, it must be possible to generate a state proof when applying a chunk, and we must ensure that - During generation of the state proof, the execution outcome is identical to if we weren't generating a state proof - When reapplying the same chunk using only the state proof (no full trie or flat storage), the outcome must also be identical. ## Problem The tricky part is storage gas accounting. For background, the trie is stored as two kinds of entries: trie nodes, and values. For trie nodes, the key is the hash of the node and the value is the node. For values, the key is the hash of the whole value, and the value is the value. When doing a lookup of either a trie node or a value, there can be three cases: - If looking up a trie node, the node is returned from flat storage (unless flat storage is disabled for testing) - Otherwise, the node or value is looked up first from the chunk cache (deterministic cache that is accumulated during chunk application). If present, it is considered an in-memory access. - Otherwise, the node or value is fetched from disk, and that's considered an from-db access. (The actual fetch however may come from another cache called the 'shard cache', which is non-deterministic and best-effort.) **Each of these cases incurs a different storage operation cost.** Therefore, when recording and replaying, we must also compute the exact same cost, by correctly determining which case happens for each access. Previously, this was difficult, because the chunk cache lives in TrieCachingStorage, while the flat storage layer lives in Trie. (As a background, Trie contains TrieStorage, which is a base class that is implemented by TrieCachingStorage, TrieRecordingStorage, and TriePartialMemoryStorage). So when recording or replaying, the TrieStorage used is NOT a TrieCachingStorage, but either TrieRecordingStorage or TrieMemoryStorage. This makes the chunk cache inaccessible, and so the other two storage implementations had to replicate the behavior; however this behavior was impossible to replicate because the flat storage layer is independent: * Consider when a lookup happens through flat storage. The Trie looks it up from flat storage and then skips calling the underlying TrieStorage. This is undesirable if we're recording, because even though the result came from flat storage, we must still include it in the recorded state proof. That means we should call the underlying TrieStorage to do a lookup, but then that would increment in-memory and db access counters which would be incorrect. ## What this PR does This PR moves the critical chunk cache logic out of TrieCachingStorage and into a field of Trie, which ensures that no matter what TrieStorage is being used, the chunk cache logic is always the same. Additionally, TrieRecordingStorage is removed and the logic is also moved into Trie as an optional field. This allows Trie to have total control of when to cache and when to record: * In the flat storage case, if we're recording, we would additionally look up via the trie (so we can produce the state proof), but this lookup goes straight to the TrieStorage layer, which means we're not counting the cost of such trie accesses. This ensures that the gas computation is the same whether we're recording or not. * If we're recording, any access that went to the TrieStorage layer also goes into the recorded state proof. This is done in a helper method so we ensure we don't miss anything. * If we're replaying, instead of using TrieCachingStorage, we use TriePartialMemoryStorage. The chunk cache layer remains the same since that's independent. There's one difference, which is that we don't have flat storage when replaying. The trick here is to do trie lookups without counting the cost, just like when recording. This is still not a perfect design as there's still quite a few places where we penetrate the abstraction layers, but it's better than before. At least we don't need to have a TrieRecordingStorage having a TrieCachingStorage as the backend and having to reimplement chunk caching behavior everywhere. ## Testing Eight randomized tests have been added (see trie_recording.rs) to check that in various scenarios (chunk cache enabled or disabled; looking up existing or missing keys; using flat storage or not), the trie access results (values and counters) are exactly the same between normal trie, recording trie, and replaying trie. This should give us confidence that this implementation is suitable for stateless validation. Also, I'm running a mainnet node with this code. For a few hundred blocks it's been surviving fine, so the implementation should be consistent with the previous, too. Nayduck: https://nayduck.near.org/#/run/3150

…ofs (near#9350) ## Background For stateless validation, it must be possible to generate a state proof when applying a chunk, and we must ensure that - During generation of the state proof, the execution outcome is identical to if we weren't generating a state proof - When reapplying the same chunk using only the state proof (no full trie or flat storage), the outcome must also be identical. ## Problem The tricky part is storage gas accounting. For background, the trie is stored as two kinds of entries: trie nodes, and values. For trie nodes, the key is the hash of the node and the value is the node. For values, the key is the hash of the whole value, and the value is the value. When doing a lookup of either a trie node or a value, there can be three cases: - If looking up a trie node, the node is returned from flat storage (unless flat storage is disabled for testing) - Otherwise, the node or value is looked up first from the chunk cache (deterministic cache that is accumulated during chunk application). If present, it is considered an in-memory access. - Otherwise, the node or value is fetched from disk, and that's considered an from-db access. (The actual fetch however may come from another cache called the 'shard cache', which is non-deterministic and best-effort.) **Each of these cases incurs a different storage operation cost.** Therefore, when recording and replaying, we must also compute the exact same cost, by correctly determining which case happens for each access. Previously, this was difficult, because the chunk cache lives in TrieCachingStorage, while the flat storage layer lives in Trie. (As a background, Trie contains TrieStorage, which is a base class that is implemented by TrieCachingStorage, TrieRecordingStorage, and TriePartialMemoryStorage). So when recording or replaying, the TrieStorage used is NOT a TrieCachingStorage, but either TrieRecordingStorage or TrieMemoryStorage. This makes the chunk cache inaccessible, and so the other two storage implementations had to replicate the behavior; however this behavior was impossible to replicate because the flat storage layer is independent: * Consider when a lookup happens through flat storage. The Trie looks it up from flat storage and then skips calling the underlying TrieStorage. This is undesirable if we're recording, because even though the result came from flat storage, we must still include it in the recorded state proof. That means we should call the underlying TrieStorage to do a lookup, but then that would increment in-memory and db access counters which would be incorrect. ## What this PR does This PR moves the critical chunk cache logic out of TrieCachingStorage and into a field of Trie, which ensures that no matter what TrieStorage is being used, the chunk cache logic is always the same. Additionally, TrieRecordingStorage is removed and the logic is also moved into Trie as an optional field. This allows Trie to have total control of when to cache and when to record: * In the flat storage case, if we're recording, we would additionally look up via the trie (so we can produce the state proof), but this lookup goes straight to the TrieStorage layer, which means we're not counting the cost of such trie accesses. This ensures that the gas computation is the same whether we're recording or not. * If we're recording, any access that went to the TrieStorage layer also goes into the recorded state proof. This is done in a helper method so we ensure we don't miss anything. * If we're replaying, instead of using TrieCachingStorage, we use TriePartialMemoryStorage. The chunk cache layer remains the same since that's independent. There's one difference, which is that we don't have flat storage when replaying. The trick here is to do trie lookups without counting the cost, just like when recording. This is still not a perfect design as there's still quite a few places where we penetrate the abstraction layers, but it's better than before. At least we don't need to have a TrieRecordingStorage having a TrieCachingStorage as the backend and having to reimplement chunk caching behavior everywhere. ## Testing Eight randomized tests have been added (see trie_recording.rs) to check that in various scenarios (chunk cache enabled or disabled; looking up existing or missing keys; using flat storage or not), the trie access results (values and counters) are exactly the same between normal trie, recording trie, and replaying trie. This should give us confidence that this implementation is suitable for stateless validation. Also, I'm running a mainnet node with this code. For a few hundred blocks it's been surviving fine, so the implementation should be consistent with the previous, too. Nayduck: https://nayduck.near.org/#/run/3150

Restructure TrieStorage and implement correct semantics for state pro…

136587f

…ofs.

robin-near requested review from wacban and Longarithm July 26, 2023 05:07

robin-near requested a review from a team as a code owner July 26, 2023 05:07

robin-near mentioned this pull request Jul 26, 2023

Make state witness possible to implement. #9327

Closed

Longarithm requested changes Jul 27, 2023

View reviewed changes

robin-near added 2 commits July 27, 2023 10:43

Merge remote-tracking branch 'origin/master' into HEAD

e3a4c8a

Address comments.

4645a50

robin-near requested a review from Longarithm July 27, 2023 19:53

Longarithm approved these changes Jul 28, 2023

View reviewed changes

Merge branch 'master' into proof2

7f9e848

robin-near added the S-automerge label Jul 28, 2023

near-bulldozer bot merged commit ab7e238 into near:master Jul 28, 2023
2 checks passed

Longarithm mentioned this pull request Jun 5, 2024

Implement chunk nodes cache for TrieRecordingStorage #6316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure TrieStorage and implement correct semantics for state proofs #9350

Restructure TrieStorage and implement correct semantics for state proofs #9350

robin-near commented Jul 26, 2023 •

edited

Loading

Longarithm left a comment

Longarithm Jul 27, 2023

Longarithm Jul 27, 2023

robin-near Jul 27, 2023

Longarithm Jul 27, 2023

robin-near Jul 27, 2023

Longarithm Jul 27, 2023

robin-near Jul 27, 2023 •

edited

Loading

Longarithm Jul 27, 2023

robin-near Jul 27, 2023

Longarithm left a comment

		@@ -321,9 +326,31 @@ impl std::fmt::Debug for TrieNode {
		}

		pub struct Trie {

Restructure TrieStorage and implement correct semantics for state proofs #9350

Restructure TrieStorage and implement correct semantics for state proofs #9350

Conversation

robin-near commented Jul 26, 2023 • edited Loading

Background

Problem

What this PR does

Testing

Longarithm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robin-near Jul 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Longarithm left a comment

Choose a reason for hiding this comment

robin-near commented Jul 26, 2023 •

edited

Loading

robin-near Jul 27, 2023 •

edited

Loading