Refactor WalkTrie #135

jochem-brouwer · 2020-11-28T22:59:52Z

Related: #131

This introduces a WalkStrategy object, which is responsible for handling the event queue. Consumers of this class are themselves responsible for adding events to the queue.

It adds three queuing methods: one to just put a specific node in the queue: pushNode(nodeRef: Buffer, key: Nibbles = [], priority?: number), one to only add a certain branch of a BranchNode into the queue: onlyBranchIndex(node: TrieNode, key: Nibbles = [], childIndex: number, priority?: number) and one to add all children to the event queue: allChildren(node: TrieNode, key: Nibbles = []) .

The logic itself is mostly based upon the original logic, but now it is stripped down in what I think is a much more readable fashion.

Have to check a few gotcha's here, write some better docs and also see if we can refactor findPath a bit, and check if we can use another type of event queue here. We do need an event queue due to OOM errors on too large MPTs. I also have to verify this is correct, because tests still seem to pass even though those changes are rather huge ( 😅 ).

Also explicitly exposes walkTrie and lookupNode for our consumers. WalkStrategy is now also exposed. This has to do with Beam Sync in the client.

Linking the issue here.

Some things to do in future PRs:

Check if we can also refactor the consumers of walkTrie especially findPath.
Update the prioritizedTaskExecutor to not sort the array each time.
Add an optional function onNodeNotFound which we can use to serially walkTrie and request the nodes which we need on-the-fly.

introduce WalkStrategy fix checkpoint benchmark

coveralls · 2020-11-28T23:00:55Z

Coverage increased (+0.2%) to 94.765% when pulling 7e7780d on refactor-walktrie into 3be5389 on master.

src/util/walkStrategy.ts

walkStrategy: cleanup the logic export walkStrategy so external consumers can use this

jochem-brouwer · 2020-11-30T17:34:33Z

This should be ready for review now. I battle-tested this by linking it in the VM and running npm run test:blockchain - no errors. Which should give enough confidence that the implementation is correct.

src/util/walkStrategy.ts

ryanio · 2020-11-30T18:44:36Z

nice work! this is definitely much more readable. do you have any initial insights on the performance improvements?

src/util/walkStrategy.ts

jochem-brouwer · 2020-11-30T19:01:28Z

I did run benchmarks/random.ts before and after these changes, before this took about 25 seconds, now it takes about 20 seconds. I don't know if this would imply that there's a 20% speed improvement since that is also highly dependent on the environment. We might see some changes if we import this in the VM and run the benchmarks there.

jochem-brouwer · 2020-11-30T21:24:24Z

I added a test here. I don't know if this is the expected behavior as it pushes two nodes on the stack; the ExtensionNode which is still in the DB but also null which indicates that the hash key pointed to by the ExtensionNode is not found. I think I am OK with that - since the node found also reports null we already know that we were not able to find a path (or, in other words; the key is currently empty).

src/baseTrie.ts

ryanio · 2020-12-03T20:20:14Z

src/baseTrie.ts

-    })
+  /**
+   * @hidden
+   * Backwards compatibility


could you elaborate more here, it's a little confusing that an underscore-prefixed method would need to be kept around for backwards compatibility, do you think others are using this that would lead to a breaking change if removed?

We can drop it, but this changes the function signature of the library and would thus imply a major release? It was not officially marked as private. But I doubt any external user is using it.

makes sense thanks, we can see what @holgerd77 thinks, i'm fine either way 👍

Yeah, I think it might make sense to keep this for now - judging from our own loose usage pattern of private methods (we might actually want to give this topic some more structured thought respectively it likely makes sense to think this a bit along once one stumbles upon a private method used to raise the question: why are we actually doing this and can we a) achieve the same goal in a way using the public API or b) is the public API of the library used not sufficient and should we additionally expose something there? Doesn't need to be solved "on site" but can also very well lead to a new issue)

For now I've opened a new v5 planning issue #136 where we can collect breaking-change wishes as well as deprecation tasks and have added the _walkTrie() removal as a TODO. One step in the direction that we generally get to a bit more structured on major release planning.

Co-authored-by: Ryan Ghods <ryan@ryanio.com>

src/baseTrie.ts

src/index.ts

update docs, add walkController to docs update changelog

typedoc.json

fix consumers of walkTrie with new type findPath does not put a null on the path in case key does not exist rebuild docs

jochem-brouwer · 2020-12-03T21:26:17Z

Okay, just pushed 9275cc3 which also changes the logic of findPath, I battle-tested this again in the VM with ~100 Istanbul tests, so it seems fine to me. I think this is the intended use case for findPath (also updated the corrupt DB test). Note that this generalized the FoundNodeFunction such that it also works on incomplete/corrupt DBs.

Note: null will ONLY be used as argument in onNode (in e.g. findPath) in case that the DB is incomplete/corrupt.

holgerd77 · 2020-12-04T11:10:08Z

CHANGELOG.md

@@ -6,6 +6,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
 (modification: no type change headlines) and this project adheres to
 [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

+## [4.1.0] - 2020-12-03


Please always use UNRELEASED in occasions of opening a new CHANGELOG entry, this otherwise has some high potential in misleading others to think that v4.1.0 would have already been released.

(this actually e.g. happened recently on the block library with the v3 release having a date in CHANGELOG but wasn't released for months which at some point was the source for a high level of confusion)

A yes, you are right, it felt a bit strange to also put a date on this one and a version number.

src/baseTrie.ts

holgerd77 · 2020-12-04T11:27:37Z

src/util/walkController.ts

+  ): Promise<void> {
+    const strategy = new WalkController(onNode, trie, poolSize ?? 500)
+    await strategy.startWalk(root)
+  }


Haven't read all the comment on the PR, so might have been already raised: I am not really seeing the need for this extra newWalk() function which just instantiates the WalkController and then calls startWalk() so I would rather stick with letting people call these consecutively themselves:

const strategy = new WalkController() strategy.startWalk()

I think I generally like this also better from a readability PoV since newWalk() from the naming doesn't imply yet that there is something happening (could be very well just only an instantiation function), so this can be misleading.

I explicitly did this to ensure that our consumers will not run two walks in parallel (that is not what this class is designed for). I could rename it to startWalk?

I'm assuming this is resolved now due to your new comments?

From my side yes

holgerd77

Some comments to address, this otherwise looks great, thanks Jochem 👍 , a great extract from the baseTrie class making things a lot more readable.

Side note: I ran the benchmarks, first time a bit slower, second run similar to current master. So seems there is not too much of a significant difference here by the code changes.

holgerd77 · 2020-12-07T10:38:56Z

src/baseTrie.ts

-        }
-      }
+  async walkTrie(root: Buffer, onFound: FoundNodeFunction): Promise<void> {
+    await WalkController.newWalk(onFound, this, root)


When seeing this here in usage this reads pretty well, think we can leave this! 😄

Actually a great API setup with this new exposed walkTrie() method, the backwards-compatibility with _walkTrie() and the separate WalkController which is wrapped here but also can be used separately and e.g. also extended or replaced. Cool! 👍 🙂

holgerd77 · 2020-12-07T10:48:09Z

src/util/walkController.ts

+   * @param childIndex - The child index to add to the event queue
+   * @param priority - Optional priority of the event, defaults to the total key length
+   */
+  async onlyBranchIndex(


Is there some async part in this function I am not seeing? 🤔 I tested removing this and the tests still ran fine.

You are right, this is indeed not an async function and is also not intended to be. Adding tasks to the queue is synchronous.

holgerd77 · 2020-12-07T10:50:52Z

src/util/walkController.ts

+/**
+ * WalkController is an interface to control how the trie is being traversed.
+ */
+export default class WalkController {


We have no other default exports in the MPT library (as far as my checks go), do me might want to do a normal export here as well? Then we get more flexible if we might want to add another walk controller by this file at some PIT and we also wouldn't need this import-export control flow in src/index.ts which would look more consistent to me.

I will edit this, we can export it as a default later if we decide to.

holgerd77 · 2020-12-07T10:58:58Z

CHANGELOG.md

+
+## New features
+
+There is a new exported `WalkController` object. Also, `trie.walkTrie` is now a public method. Using the `WalkController`, one can create their own custom ways to traverse the trie.


Some nit-picks:

object -> class

"one" + "their" doesn't fit, just remove "their" -> "create own custom ways)

trie.walkTrie -> trie.walkTrie()

Can you now also please add a reference to the PR on an update, since this is now created? Side note: it is also often possible (respectively: matches in most cases) to guess the PR number pre-push by just using the last submitted issue or PR url and then manually enter larger numbers. Once a 404 comes up the URL can almost-for-sure used as a PR URL in the CHANGELOG. 😄

holgerd77 · 2020-12-07T11:00:47Z

src/util/walkController.ts

+  /**
+   * Creates a new WalkController
+   * @param onNode - The `FoundNodeFunction` to call if a node is found
+   * @param trie - The `Trie` to walk on


poolSize is missing here

holgerd77 · 2020-12-07T11:04:16Z

src/util/walkController.ts

+        .catch((e) => {
+          reject(e)
+        })
+    })


I think this would be better readable using await together with try/catch.

I agree, but if I do that then the linter complains, because this is in a new Promise. I will rewrite it and add a eslint-ignore there.

holgerd77 · 2020-12-07T11:09:44Z

src/util/walkController.ts

+      const childNode = await this.trie._lookupNode(nodeRef)
+      taskCallback()
+      this.processNode(nodeRef as Buffer, childNode as TrieNode, key)
+    })


I have a bit hard time to read this. Is this executed (as suggested when reading taskExecutor.execute()) or just pushed to a queue like reading in the code documentation? Any way we can name this better? Is the existing taskExecutor.execute() named properly? 🤔

If it is more on the queuing side I would prefer a more explicit name like pushNodeToQueue, I remember that I always had my problems with reading/interpreting pushNode(), this time on review actually again.

(comment is about the whole pushNode() function)

You are right that the names are a bit ambiguous, I will edit those.

update docs update changelog rebuild docs

jochem-brouwer · 2020-12-07T22:56:31Z

Just pushed a commit which should address your comments, @holgerd77, ready for re-review! 😄

holgerd77

Great PR, thanks, looks good now! 😄

holgerd77 · 2020-12-08T08:27:24Z

src/prioritizedTaskExecutor.ts

   * @private
   * @param priority The priority of the task
   * @param fn The function that accepts the callback, which must be called upon the task completion.
   */
-  execute(priority: number, fn: Function) {
+  executeOrQueue(priority: number, fn: Function) {


refactor walk trie

480a072

introduce WalkStrategy fix checkpoint benchmark

jochem-brouwer added the PR state: WIP label Nov 28, 2020

jochem-brouwer commented Nov 28, 2020

View reviewed changes

src/util/walkStrategy.ts Outdated Show resolved Hide resolved

jochem-brouwer commented Nov 28, 2020

View reviewed changes

src/util/walkStrategy.ts Outdated Show resolved Hide resolved

jochem-brouwer added 2 commits November 30, 2020 18:01

trie: expose some methods

f4bddd4

walkStrategy: cleanup the logic export walkStrategy so external consumers can use this

fix lint problems

f4c6493

jochem-brouwer commented Nov 30, 2020

View reviewed changes

src/util/walkStrategy.ts Outdated Show resolved Hide resolved

ryanio reviewed Nov 30, 2020

View reviewed changes

src/util/walkStrategy.ts Outdated Show resolved Hide resolved

add null-value in DB test

a536e12

jochem-brouwer added PR state: needs review and removed PR state: WIP labels Nov 30, 2020

jochem-brouwer marked this pull request as ready for review November 30, 2020 21:24

jochem-brouwer requested a review from holgerd77 December 2, 2020 18:40