Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: state boundaries on KV pairs #8995

Merged
merged 20 commits into from
May 15, 2023
Merged

Conversation

Longarithm
Copy link
Member

@Longarithm Longarithm commented May 2, 2023

First step towards #8984. Here I want to guarantee that state part boundaries always correspond to some key-value pair except two corner cases:

  • part_id = 0 -> trie key is empty which is lower than all keys
  • part_id = num_parts -> trie key is [16] which is larger than all keys
    This guarantees that parts cover all nodes in state.

It solves an inconvenience on the way to #8898. It is useful to assume that boundaries are keys, because it allows to restore all keys in part by making trivial range query to flat storage. Otherwise you need a hack to convert one last nibble to byte. This is also necessary if we switch to AVL or other tree some day - AVL should not know about trie nodes, and interface should be defined in terms of state key-value pairs.

Some auxiliary work:

  • more documentation for state_parts
  • removing visit_nodes_for_size_range_old as it was needed for backwards compatibility for version deprecated long ago

@nikurt note that after this, existing state parts become incompatible with newly generated ones.

Testing

Testing is a pain because current testset is not well organised. I'm adding two tests specifically for new behaviour:

  • boundary_is_state_key - checks that state boundary is a key for sampling small trie. Doesn't pass without this change.
  • single_path_trie - small sanity check that keys are evenly distributed among state parts.

Also testing revealed that run_test_parts_not_huge doesn't check anything, see Zulip thread. I refactored test in such way that we separately check proof size and whole part size. Manually checked that two parts doesn't fit in memory limit for that test.

@Longarithm Longarithm self-assigned this May 2, 2023
@Longarithm Longarithm added the A-storage Area: storage and databases label May 2, 2023
@Longarithm Longarithm marked this pull request as ready for review May 4, 2023 10:45
@Longarithm Longarithm requested a review from a team as a code owner May 4, 2023 10:45
@nikurt
Copy link
Contributor

nikurt commented May 4, 2023

existing state parts become incompatible with newly generated ones.

It breaks backwards compatibility, but state sync is currently not working for testnet and mainnet. Sounds good to me to proceed without explicitly taking care of backwards compatibility.

/// state part boundaries and storing state items for state part range.
pub enum PartialState {
/// State represented by the trie nodes.
Nodes(Vec<StateItem>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes is confusing, because it contains both Trie Nodes and Values referenced by those nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about TrieItems?
I really want to resolve this confusion in the whole codebase and call these entities "items" instead of "nodes" or "nodes-or-values"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TrieItem sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another look and renamed with TrieValue. Because here we store only DB values, and item is more like KV pair in my mind.

} else {
if *memory_skipped + node_size <= memory_threshold {
*memory_skipped += node_size;
} else if node.node.has_value() {
Copy link
Contributor

@nikurt nikurt May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to extend key_nibbles in case the current node is a Leaf, which is done at line 137?
But IIUC, this code will return at line 130 and not reach line 137.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever should actually hit the line 137 and that whole clause of match. Because if we got to the Leaf, then we know that this is THE node, so line 127 should return false and we should return Ok(false) with incomplete key.
But, I also don't think that completeness of the key maters. What maters is that it is a key prefix that defines the same set of vertices as a full key -- only the Leaf node. Slice of key in the Leaf node is kinda redundant suffix in terms of the trie structure.
My intuition here is that we only use this key_nibbles to iterate trie -- not to retrieve nodes (needs checking), and we iterate trie based on function that works with prefixes (again, needs checking, I'm talking about seek_nibble_slice)
In slightly different words, I think we only working in Trie based on key comparison as in <=, >=, not equality. All of these inequalities will yield the same value if we switch full leaf key for that truncated leaf key, but only if we are only working with this Trie elements. That is true because we constructed such a key prefix that is long enough to include at least one different character from every other Trie element.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is right that key_nibbles determines node uniquely. However, now I explicitly want key_nibbles to correspond to existing state key in the end. So it was a bug and I fixed it - for a Leaf, I extend key with remaining nibbles and then return.

Also it makes logic more consistent in the sense that key_nibbles always stores all information found during traversing trie.

return Ok(false);
}

match &node.node {
TrieNode::Empty => Ok(false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are Empty nodes?

Copy link
Member Author

@Longarithm Longarithm May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a single cornercase when Trie is completely empty. The comment looks misleading though, I'll look at it later.

} else {
if *memory_skipped + node_size <= memory_threshold {
*memory_skipped += node_size;
} else if node.node.has_value() {
return Ok(false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned with the integrity of the memory_skipped? Because I think we skip some nodes without values that still have non zero size, but don't record it in memory_skipped, because there is no other else.
This only happens when memory_skipped is already over threshold, so it doesn't matter for this function, but maybe memory_skipped is used outside somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately it is not used outside, but you are right, fixed.

near-bulldozer bot pushed a commit that referenced this pull request May 11, 2023
Subj. If during iteration over range `[path_begin, path_end)` you ended up in a Value, it is still necessary to check that current key (key_nibbles) doesn't exceed `path_end`. This is done already when we descend into internal node, and we need to treat Value in the same way.

Fortunately there is no visible impact, as we use it to eventually write the whole state, so we just write some values more than once. But it messes with refcounts during testing and leads to refcount leaks.

## Testing

Adding `test_visit_interval` specifically for that case. It doesn't pass without fix. `visit_nodes_interval` obviously needs more tests, but at least #8995 should pass tests after this.
@Longarithm
Copy link
Member Author

Longarithm commented May 11, 2023

Re-requesting review. Addressed existing comments and added more tests. Now PR has more changes so I also added more description to it.

@Longarithm Longarithm requested a review from nikurt May 11, 2023 13:02
/// TODO (#8984): consider supporting format containing trie values only for
/// state part boundaries and storing state items for state part range.
pub enum PartialState {
/// State represented by the set of unique trie values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention explicitly that this includes Nodes and Values.

// This line should be unreachable if we descended into current node.
// TODO (#8997): test this case properly by simulating trie data corruption.
Err(StorageError::StorageInconsistentState(format!(
"Skipped all children of node {node:?} while finding memory \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Skipped all children of node {node:?} while finding memory \
"Skipped all children of node {node:?} while searching for memory \

@near-bulldozer near-bulldozer bot merged commit f3bc243 into near:master May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: storage and databases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants