Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non optimal TrieDBMut key values queries on batch processing. #109

Open
cheme opened this issue Jul 31, 2020 · 0 comments
Open

Non optimal TrieDBMut key values queries on batch processing. #109

cheme opened this issue Jul 31, 2020 · 0 comments

Comments

@cheme
Copy link
Contributor

cheme commented Jul 31, 2020

paritytech/substrate#6780 did reveal that different input order of changes when processing a root with triedbmut can result in a slightly different query plan and thus different proof of execution.

The root cause is that triedbmut algorithm do 'fuse a branch without value and a single child with this single child' too eagerly.

The operation is applied in function fix and do access the single child.

If later we add a children to the deleted (when fused) branch, the deleted branch can be restored and there will be no use to query the single child.

So we end up querying an unneeded node in respect to a batch update.
This is not really bad as the hash of the fuse node is not calculated (hash are calculated lazily), but it ends up being an issue with the way we register proof in substrate (we store all node queried on the kv backend).

A first way to solve this should be to apply fix lazily (on root calculation) as we do with hash calculation and db writing, but it seems to me that this will require ordering the fix calls and is not as simple as insert node into memorydb.

A second way should be to rewrite the batch update.
I would propose/illustrate with (see #110 draft pr), to use a batch update that works on a sorted change and thus allow applying the node fusing only when we are sure no other change will be done on the branch (when exiting it). This kind of batch update is also good when looking at memory footprint (only a stack of max 16 nodes needs to be kept in memory), but it is not really relevant in substrate use case.

A third way would be to add the additional fix related query to substrate spec.
This is not a sound idea, specifying that the proof should be the strictly the required set of trie nodes to run the operation seems more correct to me and would make conflict with other implementation easier to rules out.

@cheme cheme mentioned this issue Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant