Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: record nodes for writes in memtrie #10841

Merged
merged 7 commits into from
Mar 25, 2024

Conversation

Longarithm
Copy link
Member

@Longarithm Longarithm commented Mar 20, 2024

Fixes #10769.

UPD: New motivation

While all said below is true, the main motivation is to account for nodes impacted at squash_nodes which can be touched without much pain only at Trie::update.

Old motivation

The root cause of errors "node not found" on reads from recorded storage is that nodes touched only at Trie::update - KV updates in the end of chunk applictaion - were never recorded on memtrie.
It happens automatically on disk tries. For memtries we retroactively account for trie node accesses, but this wasn't done in similar way for memtrie updates.
While infrastructure indeed needs to be improved so these issues won't surprise us anymore, the easiest way is to account for all accesses in the same way as move_node_to_mutable does - on memtrie paths like ensure_updated which move node out of storage to memory, which means that node has to be recorded, so trie can be validated. Unfortunately refcount changes are not enough to cover this, so I record trie accesses separately. Also, memtrie doesn't have access to values, so I have to retrieve values and record them in the Trie::update.

Testing

  • New logic is covered by trie_recording tests. I also discovered an issue there - destructively_delete_in_memory_state_from_disk was removing all keys because key decoding was incorrect. Now these tests fully cover trie access scenario: first we have get&get_ref accesses, and in the very end we have update call for some KV pairs. I confirmed that for incorrect impl some tests fail because of the same "node not found" error, and also that new sanity check on Trie::update works.
  • test_memtrie_discrepancy is renamed and works now

@Longarithm Longarithm changed the title draft: record trie writes draft: record nodes for writes in memtrie Mar 20, 2024
@Longarithm Longarithm marked this pull request as ready for review March 20, 2024 13:34
@Longarithm Longarithm requested a review from a team as a code owner March 20, 2024 13:34
@Longarithm Longarithm changed the title draft: record nodes for writes in memtrie fix: record nodes for writes in memtrie Mar 20, 2024
@jancionear jancionear self-requested a review March 20, 2024 13:53
Copy link

codecov bot commented Mar 20, 2024

Codecov Report

Attention: Patch coverage is 97.84946% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 71.42%. Comparing base (b8048da) to head (c01638c).
Report is 4 commits behind head on master.

Files Patch % Lines
core/store/src/trie/mod.rs 88.88% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #10841    +/-   ##
========================================
  Coverage   71.41%   71.42%            
========================================
  Files         763      764     +1     
  Lines      153717   153956   +239     
  Branches   153717   153956   +239     
========================================
+ Hits       109781   109965   +184     
- Misses      38938    38991    +53     
- Partials     4998     5000     +2     
Flag Coverage Δ
backward-compatibility 0.24% <0.00%> (-0.01%) ⬇️
db-migration 0.24% <0.00%> (-0.01%) ⬇️
genesis-check 1.41% <0.00%> (-0.01%) ⬇️
integration-tests 36.82% <65.59%> (+0.03%) ⬆️
linux 70.05% <97.84%> (-0.02%) ⬇️
linux-nightly 70.91% <97.84%> (-0.02%) ⬇️
macos 54.62% <97.84%> (-0.04%) ⬇️
pytests 1.64% <0.00%> (-0.01%) ⬇️
sanity-checks 1.42% <0.00%> (-0.01%) ⬇️
unittests 67.14% <97.84%> (-0.01%) ⬇️
upgradability 0.29% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@walnut-the-cat
Copy link
Contributor

Out of curiosity, how much additional memory do we use with this change?

}

// Retroactively record all accessed trie items to account for
// key-value pairs which were only written but never read, thus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a gap in my logical understanding: where is the link that goes from "a key is written to but not read from" to "we must have the previous value for that key"?

For the specific test case that I knew of of a trie restructuring (converting branch of 1 child to extension, I know that the child node must be read in order to do this conversion, and at the same time the child node is deleted, so in this case, yes, the child is written to (as in, its refcount was decremented) but not read from (as in, did not get queried), and the child needed its previous value (in order to fulfill the restructuring)), but is that in general the only case where the update phase would cause a value to be written to but not read from?

So in other words, is the logic this: "if a node is written to but not read from, then that node must have been replaced during a trie restructuring, and therefore its previous value must be known" (which I need to be convinced of), or is there some other way to draw this conclusion?

Copy link
Member Author

@Longarithm Longarithm Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a node is written to but not read from, then that node must have been replaced during a trie restructuring, and therefore its previous value must be known

I think "key" got replaced with "node" somewhere in the message.

The logic is "if a key is written to but not read from" => "all nodes on the trie path must be known".
Let's say you don't know some node on the path to it and take the lowest of them. Then you can't recompute it after write, because you don't know hashes of neighbours of the key, so you can't compute new state root.

And, yeah, when we descend into node corresponding to key, all refcounts on the way are decremented.

I actually want to propose simpler logic by this PR: "for all keys, for which any state operation was called (get/has/write/remove), all nodes to the old path must be recorded". That's what current disk trie logic does, and that's easy to explain and follow.
For memtries, union of (nodes recorded on chunk application) and (nodes recorded on trie restructuring) should give this set. So I didn't even think much if this is completely necessary, it's enough for me that it is verifiable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok, thanks for clarifying your intent in the mathematical form :)

I think my question is a bit more detailed than this.

Suppose we replace a node from the trie. Do we actually need the node that we're replacing? The parent of that node would have the hashes of the sibling nodes that we need to reconstruct the parent. So for example we have A -> B -> C -> D and we're deleting D. Do we need to have D, because to reconstruct a A' -> B' -> C' -> D', we only need A, B, C, and D'. D is not needed. But in the logic in this PR, it seems that it requires that D be needed. That's the part I don't understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense.
Then, my argument goes to:

  • that's what current disk logic do. Every time we descend to node, we pull it out by move_node_to_mutable or delete_value. By default I would avoid the logic in couple places to "if this is the last node, don't do it" + adding cornercase when this node is a restructured branch.
  • the logic "we take all nodes on the path" is simpler to explain than "we take all nodes on reads + all nodes except the last one on writes"
  • savings from not including previous value shouldn't be high, because it is a rare case to modify value without reading it. action_deploy_contract reads contract before redeploying it, etc. Still we can store only ValueRefs of prev values, but I'm not 100% confident I'll do it right the first time :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are recording node D in A -> B -> C -> D for on disk tries, let's try to keep consistency and do the same for memtries even if it's not the most efficient.

pub nodes: HashMap<CryptoHash, Arc<[u8]>>,
/// Hashes of accessed values - because values themselves are not
/// necessarily present in memtrie.
pub values: HashSet<CryptoHash>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have an optional value, if the value is present in memtrie?

if let Some(trie_refcount_changes) = self.trie_refcount_changes.as_mut() {
trie_refcount_changes.subtract(hash, 1);
if let Some(tracked_node_changes) = self.tracked_trie_changes.as_mut() {
tracked_node_changes.accesses.values.insert(hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to below comment, does subtracting refcount for the value guarantee that we must need the previous value of the value? If so, why?

@robin-near
Copy link
Contributor

Out of curiosity, how much additional memory do we use with this change?

Should be zero, it doesn't change the contents of the memtries.

Comment on lines 1520 to 1530
// not recorded before.
if let Some(recorder) = &self.recorder {
for (node_hash, serialized_node) in trie_accesses.nodes {
recorder.borrow_mut().record(&node_hash, serialized_node);
}
for value_hash in trie_accesses.values {
let value = self.storage.retrieve_raw_bytes(&value_hash)?;
recorder.borrow_mut().record(&value_hash, value);
}
}
Ok(trie_changes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sad that the changes are recorded retroactively.
For state witness size limit it would be best to record them as they happen, this would make it possible to stop executing a receipt once it generates more than X MB of PartialState. With retroactive recording we can only measure how much PartialState was generated after the update is applied, which AFAIU happens after applying the whole chunk :/

Actually, doesn't this break #10703? The soft size limit looks at the size of recorder state after applying each receipt and stops executing new ones when this size gets above the limit. It depends on online recording, it wouldn't work with retroactive one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to somehow put the recording logic in TrieUpdate::set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I guess it works the same way with normal trie writes. That sucks, it really throws a wrench into the witness size limit. This solution makes sense for now, but I think the whole logic of trie updating will need a refactor for the size limit :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay actually we don't need to record writes for the state witness, only the reads need to be recorded :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the meeting today, turns out we do want to record the nodes for updation/deletion :(
I believe this update function is only called in trie_update finalize for all practical purposes?

We would need to revisit this and get a better solution for recording on the fly like in the get function instead of trie_update finalize. Required for both soft and hard limit state witness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently (other than trie restructuring), I guess we just rely on the fact that runtime first calls a get before update/delete for gas cost estimation.

/// Hashes of accessed values - because values themselves are not
/// necessarily present in memtrie.
pub values: HashSet<CryptoHash>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand what are the additional nodes that this change records.

AFAIU runtime updates the trie by calling storage_write, which always does a trie read before writing a value:

let evicted_ptr = self.ext.storage_get(&key, StorageGetMode::Trie)?;

This trie read records all nodes that were accessed to reach this value, even in the case of memtries:

if let Some(recorder) = &self.recorder {

Doesn't that record everything that is needed to prove execution of the contract? Why do we need an additional access log?

Copy link
Contributor

@jancionear jancionear Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I guess it doesn't record nodes that are created when adding new values...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this path is called only during contract execution. The gateway to update trie outside it is set<T: BorshSerialize>(state_update: &mut TrieUpdate, key: TrieKey, value: &T).

@shreyan-gupta shreyan-gupta self-requested a review March 21, 2024 17:01
Copy link
Contributor

@shreyan-gupta shreyan-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, although I would wait for an approval from Robin.

Copy link
Contributor

@robin-near robin-near left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still don't know what to do with witness size limit right?

@Longarithm
Copy link
Member Author

We still don't know what to do with witness size limit right?

Yeah, that's a following PR.

@Longarithm Longarithm added this pull request to the merge queue Mar 25, 2024
Merged via the queue into near:master with commit 73b8827 Mar 25, 2024
31 checks passed
@Longarithm Longarithm deleted the record-trie-writes branch March 25, 2024 10:43
@walnut-the-cat walnut-the-cat added the A-stateless-validation Area: stateless validation label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Memtrie] Memtrie trie recording needs to capture read nodes during the update code path
5 participants