-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jordan/transaction pool study #799
Conversation
why: Flexible framework for constructing sorted lists with data entries that can be transparently related/linked. details: The underlying red-black tree can theoretically replaced by any other sorted table.
why: The DoublyLinkedList data type does not allow for efficient random data access (find() is O(n).)
It's not customary in Nim to use name prefixes in APIs (these are usually needed in languages with weaker support for namespaces/modules). When designing containers, it's best to use established names such as Besides the suggestion to enhance Fusion's B-Tree implementation, it occurred to me that @cheatfate has worked on another B-Tree implementation in nim-presto. To promote the reuse of these general purpose data-structures, we should really add all such modules to nim-stew. |
I have recently came across of comparison of different data structures which may be useful for mempool - https://github.com/lithdew/rheia#mempool, it maybe nice data point to the discussion. |
why: was on `todo` status: all black nodes on any chain down the tree count the same lengths
why: No need for module name in methods/functions, see #799 (comment) details: update tests, comments
why: No need for module name in methods/functions, see #799 (comment) details: update tests, comments
details: Test setup and some simple stupid data collectors. It will not do much at the moment.
why: Added table [] sematics where appropriate
nice one, thank you |
why: Allow deleting current element in a `for` loop details: More unit tests, lots of small fixes
details: Basic structure is a keyed queue, organised by data arrival. Operations are consistent, verified by unit tests.
why: Logic was inspired by go reference implementation. details: Lists and queue pulled out into NIM sub-module source files.
why: Enhances code readability details: Heavy use of return codes `rc` of type Result[] lead to repeated `value` symbols like `rc.value.value`. Now this reads `rc.value.data`.
We might want the input of @c-blake the author of https://github.com/c-blake/adix since he dived deeply in those questions in all Nim table-related PRs.
|
why: Garbled links after deleting the second entry relative to either queue end.
why: Logic was inspired by go reference implementation.
Not using ordered tables but the collision stuff might be relevant, thanks. |
For search trees I prefer to separate search & mutation which is not done here, having various search APIs populate a "path" object which things like insert/delete/rebalance can use. This allows having search-by-rank or search-by-key or search by structure (e.g. min/max/root) all as simple single API calls populating a path and then sharing a couple edit operations. Besides being manifestly non-recursive (function calls cost a lot if keys are integers, say), this factoring also makes it easy to both support duplicate keys or not and to have such keys obey a FIFO or LIFO order along duplicate chains. The economies here are not merely "less code" but also conceptual economy for users. Another such conceptual economy is to not use "explicit" left/right notation but an array of 2 pointers and [0] [1] everywhere. This allows the left-right symmetry of the binary tree to be exposed. The opposite side is the Overall, binary search trees (AVL, red-black or otherwise) are generally quite deep & slow compared to B-trees. To me their main area of application is "intra B-tree node" unless you need the "doubly linked list" like property that a BST with parent links can provide. (I.e., were the extra link can prevent an otherwise slow lookup and can make a node "autonomous"/able to delete itself.) Inside a B-tree node intranode it allows B-trees to be dynamically updated cheaply even with very wide nodes 4096 bytes say. That might be 512*(4B payload+4B) links objects. The constant memmove-like shifting of elements around to keep a sorted in-node array can get very expensive if edits are common. So, there you want the links to be small 1..2 byte "pointers" (node numbers, really). In this case, you really want a BST over a memory pool, not over virtual memory. (I.e. a memory pool like this (with type Pool*[T] = object
head: int # head of free list
free: seq[int] # free list
slot*: seq[T] # object array
proc len*[T](p: var Pool[T]): int = p.free.len
proc init*[T](p: var Pool[T], size: int) =
p.head = 0
p.free = newSeq[int](size)
p.slot = newSeq[T](size)
for i in 0 ..< size: p.free[i] = i + 1 # thread `free`
proc alloc*[T](p: var Pool[T], zero=false): int =
result = p.head # `p.len` => OUT OF SPACE
if p.head < p.len:
p.head = p.free[p.head] # return head & hop over it
if zero: zeroMem p.slot[result].addr, T.sizeof
proc free*[T](p: var Pool[T], i: int, destroy=false) =
if destroy: `=destroy`(p.slot[i])
p.free[i] = p.head
p.head = i ) not like thread pools/process pools/multi-threaded pools. (FWIW, that may be a better answer to some recent Forum post, but questioner in said post seemed confused..so, maybe not.) Sorry if this is "too much information", but I feel it was actively solicited. There may be a wrinkle or two I neglected. E.g., there is also this paper about cache effects: https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf I notice that this PR references Julienne Walker of eternallyconfuzzled. Does anyone happen to know what became of her? Her site has been down for several years now. She didn't get everything right (who does?), but got less wrong than many. |
good point, thanks -- will come back to that when i re-do/replace the underlying tree |
Oh, one other off-the-beaten path thing I did in my adix/btree is a bulk loading API which lets you build a minimum height B-tree from already ordered data. Statistically, "aged trees" are only about 2/3 full, meaning about 3/2 taller and 1.5x more work (and more space) than a bulk loaded tree. The @cheatfate b-tree Zah mentioned is very bare bones, as is the one in the Nim compiler itself and the fusion one. I only do a pool allocator (in the classic 20th century sense as detailed in my last post). This yields a notable limitation that you cannot put GC'd objects in for keys/values in my current version (though I believe otherwise it is algorithmically & code factoring better than most B-trees out there). PRs are welcome to improve it, but I would kind of like to preserve the ability to allocate on disk/memoryfiles as well as in VM. Running right off memory files has always been planned, as I am trying to encourage with https://github.com/c-blake/nio (which takes the "radical" approach that if you want to persist data maybe just..save it to a file? not a pseudo-HDF5 tarball/zipfile or SQL DB). Anyway, spanning both GC'd usage mode and external address/file offset mode seems a good bit of work/re-factoring with |
why: Ref objects made it easier to shuffle around when trying to find useful primitives (aka methods/objects).
odd: that it was accepted only by amd64 compiler
6271944
to
92a5cc3
Compare
why: Something like key-queue reflects better what this data structure is for. KQueue would be the preferred name but is too general and used already in the NIM library.
why: Not all generic functions were pulled while unit testing.
c7dca49
to
f998909
Compare
why: LRU will be needed for tx-pool. So, keequ has all the primitives needed for setting up a LRU queue. This will eventually replace lru_cache (which is over-egineered and ugly.)
GasInt > AccountNonce and AccountNonce > GasInt.
effectiveGasTip >: local/remote > nonce
nimbus/utils/tx_pool.nim
Outdated
Result[void,TxPoolError] | ||
{.inline,gcsafe,raises: [Defect,CatchableError].} = | ||
## Convenience wrapper | ||
var txs = @[tx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for allocating a sequence here, a regular array would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
nimbus/utils/tx_pool/tx_base.nim
Outdated
## Walk over item lists grouped by sender addresses. | ||
var rc = xp.bySender.first | ||
while rc.isOK: | ||
let (key, itemList) = (rc.value.key, rc.value.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to copy the data in itemList
here as opposed to yielding directly rc.value.data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to copy here as this is an internal module; will be important reassign() in the next patch when passing TxItemRef rather than IDs (this one covered in unit tests)
nimbus/utils/tx_pool/tx_base.nim
Outdated
# at your option. This file may not be copied, modified, or distributed except | ||
# according to those terms. | ||
|
||
## Transaction Pool Basig Primitives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
# Private, debugging ... | ||
# ------------------------------------------------------------------------------ | ||
|
||
proc joinXX(s: string): string = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems to be copy/pasted in multiple modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. and is debugging only, should go away later
data: V ## Some data value, can freely be modified. | ||
prv, nxt: K ## Queue links, read-only. | ||
|
||
KeeQuPair*[K,V] = object ##\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a TODO item to introduce view types for these return values once we upgrade to Nim 1.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, consider using a tuple for this. This would enable the users to use the tuple unpacking syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will try, ok :) -- applies to similar cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was wrong. Returning a tuple gloriously fails within the ok() part of a Result[], see #817
nimbus/utils/tx_pool/tx_item.nim
Outdated
# Public functions, getters | ||
# ------------------------------------------------------------------------------ | ||
|
||
proc id*(item: TxItemRef): auto {.inline.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Nim 1.2.x this will compile to a function that copies the 32 bytes of the id
each time it's called.
I tend to use templates for such simple accessors because they avoid the problem and they reduce the work the compiler needs to do (less C code to produce in the codegen pass).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks -- has changed anyway (but good to know for other instances)
nimbus/utils/tx_pool/tx_jobs.nim
Outdated
discard | ||
of txJobsInactiveJobsEviction: | ||
discard | ||
of txJobsJocalTxJournalRotation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? (Jocal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
nimbus/utils/tx_pool/tx_jobs.nim
Outdated
return id | ||
|
||
|
||
proc txDelete*(t: var TxJobs; id: TxJobID): Result[TxJobPair,void] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we call deleting a job txDelete
? Calling all deletion functions del
makes sense, but calling them "transaction delete" not so much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use it for the 2nd level list/queue drivers as syntactic sugar -- not ment to be seen outside
nimbus/utils/tx_pool/tx_list.nim
Outdated
txListVfyLeafQueue ## Corrupted leaf list | ||
txListVfySize ## Size count mismatch | ||
|
||
TxListMark* = ##\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the SLst
is keyed by gas
what are we going to use the TxListMark
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a blind value as commented, the dups queue is a keyed list with a value entry (I do not see that I can have a unified module for (key,value) and key only with no value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be exported, renamed type as BlindValue
nimbus/utils/tx_pool/tx_nonce.nim
Outdated
TxNonceItems* = object ##\ | ||
## Item list indexed by `AccountNonce` > `GasPrice`. | ||
size: int | ||
nonceList: SLst[AccountNonce,TxNoncePriceRef] ##\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a sorted list by "account nonce"? i.e. in what context do we compare accounts by their nonces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one has been removed already, but the go ref implementation defines the interface returning transaction lists sorted by nonces (so this this list will re-appear as a sub-list somewhere)
main points: + Constructors as init(T: type XXX): T rather than initXXX(): XXX + Bury unused queue values for key-only queues in KeeQu implementation (no TxListMark type used, anymore) + Redefined 'auto' return codes
why: All txDB functionality hidden by tx_tabs (formerly tx_base), table drivers in sub-directory other: The module tx_jobs at the same level as tc_tabs and the fancy naming txInit, txAdd etc. has been corrected to init(), add().
why: Cascading operators as in `bySender.eq(someAddr).eq(local = false))` is is preferred over something like `bySenderEq(someAddr, local = false)`
details: per sender address buckets: all -- all entries in the same bucket local/remote -- all entry buckets grouped by locality queued/lending/included -- all entry buckets grouped by status buckets: are sorted by nonces
why: Concurrent pool access supported details: Currently not very smart, only debugging interface. Provision for + out-of-band/priority jobs (just unshift instead of append) + Can stop/resume execution at any job
details: Using 50k blocks network capture for import/replay. This allows for about 30k blocks initial block chain. The remaining stuff is enough for playing with at least 1500 transactions.
why: Background processor cannot directly respond. So expired, unwanted, or wrong transactions along with meta-data will silently be moved to the waste basket for further investigation and/or deletion.
why: Actions on the queue must be serialised, so they run through the job monitor which allows unique access to the database for the call-back function while running.
details: Incoming transaction are classified and tagged queued or pending, or moved to the waste basket
why: bails out on ci for i386, otherwise
why: Both, tx-pool and clique unit tests can use the same dump
why: Less overhead by using semaphores
I'm trying to understand the background for this PR, and the motivation for choosing this approach in particular. From what I understand, this is a port of the geth implementation, but with changes. What I'm missing is the rationale for doing a massive effort porting an existing implementation where it's unclear whether it solves our problem (geth and nimbus priorities don't necessarily align), whether it solves the problem well, whether it is burdened by historical decisions that no longer are relevant, specially in a post-merge world where block packing often is outsourced, and whether it fits nim to begin with, given tooling, collections, threading styles etc - already, there's enough code for an entire collection library in here which speaks to the fact that it might not be the best fit - specially since transaction packing is a necessary but low-priority feature that we don't want to spend sophistication on - block packing will to a large degree be outsourced post-merge. |
It is not a port - just taking some ideas from the go implementation. AFIK there are quite a few pieces missing in our nimbus implementation and comprehensive public documentation is a precious i find hard to come by. |
Nonetheless, what I'm trying to understand, and am asking for, is information / background about why this particular approach was taken - as a study, it's interesting, but it also brings in a complex algorithm depending on multiple custom collections and indices which both represents a long development time (contrary to our needs/requirements to quickly create a prototype of the merge and a light client implementation) and a significant maintenance burden (that may not be beneficial for nimbus for the long term, given that the landscape of block production, and therefore transaction pooling, changes significantly with the merge and projects like flashbots). When embarking on a month-long quest to produce a transaction pool, these are critical questions to answer, before proceeding to the code stage. To give an example, a simple hash table bounded to the 1000 most profitable entries without any further indices would be enough to cover our needs for the next 6 months, and would allow us to focus on more important compatibility problems with goerli and the merge, allowing us to perform the tx pool work with a better understanding of the landscape. What is the rationale for not pursuing a strategy like that? |
why: jobs might be added concurrently
I decided waiting on me to port the ANSI C stuff to Nim was perhaps less than ideal. So, I just published the C version. To my knowledge there is nothing of similar efficiency & generality (in any programming language) in the open source BST arena { although maybe copies will now start to spring up! :-) }. This always struck me as The One True Way to implement binary search trees..low node space overhead, low time overhead (no recursion), high flexibility (dups or not, rank/positional or not, etc.) and high generality all in really very compact code. It's only like 4 or 5 main abstraction decisions, but combined they do a lot of reduction. A lot of the times, I mention a bunch of abstract nice properties and people smile & nod and just do their own thing, though. Also, if someone with some time wants to port to Nim, I'd be happy to include it in adix and maybe weave it into the B-tree node structure there so that B-tree edits can be more scalable. |
Sneak preview like