-
Notifications
You must be signed in to change notification settings - Fork 89
Conversation
introduce WalkStrategy fix checkpoint benchmark
walkStrategy: cleanup the logic export walkStrategy so external consumers can use this
This should be ready for review now. I battle-tested this by linking it in the VM and running |
nice work! this is definitely much more readable. do you have any initial insights on the performance improvements? |
I did run |
I added a test here. I don't know if this is the expected behavior as it pushes two nodes on the stack; the ExtensionNode which is still in the DB but also |
}) | ||
/** | ||
* @hidden | ||
* Backwards compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate more here, it's a little confusing that an underscore-prefixed method would need to be kept around for backwards compatibility, do you think others are using this that would lead to a breaking change if removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop it, but this changes the function signature of the library and would thus imply a major release? It was not officially marked as private
. But I doubt any external user is using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense thanks, we can see what @holgerd77 thinks, i'm fine either way 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it might make sense to keep this for now - judging from our own loose usage pattern of private methods (we might actually want to give this topic some more structured thought respectively it likely makes sense to think this a bit along once one stumbles upon a private method used to raise the question: why are we actually doing this and can we a) achieve the same goal in a way using the public API or b) is the public API of the library used not sufficient and should we additionally expose something there? Doesn't need to be solved "on site" but can also very well lead to a new issue)
For now I've opened a new v5
planning issue #136 where we can collect breaking-change wishes as well as deprecation tasks and have added the _walkTrie()
removal as a TODO. One step in the direction that we generally get to a bit more structured on major release planning.
Co-authored-by: Ryan Ghods <ryan@ryanio.com>
update docs, add walkController to docs update changelog
fix consumers of walkTrie with new type findPath does not put a null on the path in case key does not exist rebuild docs
Okay, just pushed 9275cc3 which also changes the logic of Note: |
CHANGELOG.md
Outdated
@@ -6,6 +6,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) | |||
(modification: no type change headlines) and this project adheres to | |||
[Semantic Versioning](http://semver.org/spec/v2.0.0.html). | |||
|
|||
## [4.1.0] - 2020-12-03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use UNRELEASED
in occasions of opening a new CHANGELOG entry, this otherwise has some high potential in misleading others to think that v4.1.0
would have already been released.
(this actually e.g. happened recently on the block library with the v3
release having a date in CHANGELOG but wasn't released for months which at some point was the source for a high level of confusion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A yes, you are right, it felt a bit strange to also put a date on this one and a version number.
): Promise<void> { | ||
const strategy = new WalkController(onNode, trie, poolSize ?? 500) | ||
await strategy.startWalk(root) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't read all the comment on the PR, so might have been already raised: I am not really seeing the need for this extra newWalk()
function which just instantiates the WalkController
and then calls startWalk()
so I would rather stick with letting people call these consecutively themselves:
const strategy = new WalkController()
strategy.startWalk()
I think I generally like this also better from a readability PoV since newWalk()
from the naming doesn't imply yet that there is something happening (could be very well just only an instantiation function), so this can be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explicitly did this to ensure that our consumers will not run two walks in parallel (that is not what this class is designed for). I could rename it to startWalk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is resolved now due to your new comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to address, this otherwise looks great, thanks Jochem 👍 , a great extract from the baseTrie
class making things a lot more readable.
Side note: I ran the benchmarks, first time a bit slower, second run similar to current master
. So seems there is not too much of a significant difference here by the code changes.
} | ||
} | ||
async walkTrie(root: Buffer, onFound: FoundNodeFunction): Promise<void> { | ||
await WalkController.newWalk(onFound, this, root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When seeing this here in usage this reads pretty well, think we can leave this! 😄
Actually a great API setup with this new exposed walkTrie()
method, the backwards-compatibility with _walkTrie()
and the separate WalkController
which is wrapped here but also can be used separately and e.g. also extended or replaced. Cool! 👍 🙂
src/util/walkController.ts
Outdated
* @param childIndex - The child index to add to the event queue | ||
* @param priority - Optional priority of the event, defaults to the total key length | ||
*/ | ||
async onlyBranchIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some async
part in this function I am not seeing? 🤔 I tested removing this and the tests still ran fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this is indeed not an async
function and is also not intended to be. Adding tasks to the queue is synchronous.
src/util/walkController.ts
Outdated
/** | ||
* WalkController is an interface to control how the trie is being traversed. | ||
*/ | ||
export default class WalkController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have no other default exports in the MPT library (as far as my checks go), do me might want to do a normal export here as well? Then we get more flexible if we might want to add another walk controller by this file at some PIT and we also wouldn't need this import-export control flow in src/index.ts
which would look more consistent to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will edit this, we can export it as a default later if we decide to.
CHANGELOG.md
Outdated
|
||
## New features | ||
|
||
There is a new exported `WalkController` object. Also, `trie.walkTrie` is now a public method. Using the `WalkController`, one can create their own custom ways to traverse the trie. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nit-picks:
- object -> class
- "one" + "their" doesn't fit, just remove "their" -> "create own custom ways)
trie.walkTrie
->trie.walkTrie()
Can you now also please add a reference to the PR on an update, since this is now created? Side note: it is also often possible (respectively: matches in most cases) to guess the PR number pre-push by just using the last submitted issue or PR url and then manually enter larger numbers. Once a 404 comes up the URL can almost-for-sure used as a PR URL in the CHANGELOG. 😄
src/util/walkController.ts
Outdated
/** | ||
* Creates a new WalkController | ||
* @param onNode - The `FoundNodeFunction` to call if a node is found | ||
* @param trie - The `Trie` to walk on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poolSize
is missing here
.catch((e) => { | ||
reject(e) | ||
}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be better readable using await
together with try/catch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but if I do that then the linter complains, because this is in a new Promise
. I will rewrite it and add a eslint-ignore
there.
src/util/walkController.ts
Outdated
const childNode = await this.trie._lookupNode(nodeRef) | ||
taskCallback() | ||
this.processNode(nodeRef as Buffer, childNode as TrieNode, key) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bit hard time to read this. Is this executed (as suggested when reading taskExecutor.execute()
) or just pushed to a queue like reading in the code documentation? Any way we can name this better? Is the existing taskExecutor.execute()
named properly? 🤔
If it is more on the queuing side I would prefer a more explicit name like pushNodeToQueue
, I remember that I always had my problems with reading/interpreting pushNode()
, this time on review actually again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(comment is about the whole pushNode()
function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that the names are a bit ambiguous, I will edit those.
update docs update changelog rebuild docs
Just pushed a commit which should address your comments, @holgerd77, ready for re-review! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR, thanks, looks good now! 😄
* @private | ||
* @param priority The priority of the task | ||
* @param fn The function that accepts the callback, which must be called upon the task completion. | ||
*/ | ||
execute(priority: number, fn: Function) { | ||
executeOrQueue(priority: number, fn: Function) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. 👍
Related: #131
This introduces a
WalkStrategy
object, which is responsible for handling the event queue. Consumers of this class are themselves responsible for adding events to the queue.It adds three queuing methods: one to just put a specific node in the queue:
pushNode(nodeRef: Buffer, key: Nibbles = [], priority?: number)
, one to only add a certain branch of aBranchNode
into the queue:onlyBranchIndex(node: TrieNode, key: Nibbles = [], childIndex: number, priority?: number)
and one to add all children to the event queue:allChildren(node: TrieNode, key: Nibbles = [])
.The logic itself is mostly based upon the original logic, but now it is stripped down in what I think is a much more readable fashion.
Have to check a few gotcha's here, write some better docs and also see if we can refactor
findPath
a bit, and check if we can use another type of event queue here. We do need an event queue due to OOM errors on too large MPTs. I also have to verify this is correct, because tests still seem to pass even though those changes are rather huge ( 😅 ).Also explicitly exposes
walkTrie
andlookupNode
for our consumers.WalkStrategy
is now also exposed. This has to do with Beam Sync in the client.Linking the issue here.
Some things to do in future PRs:
walkTrie
especiallyfindPath
.sort
the array each time.onNodeNotFound
which we can use to seriallywalkTrie
and request the nodes which we need on-the-fly.