-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust the default transaction replay thread pool size #25
Conversation
I ran some nodes against testnet and mainnet to gather some information about the batches getting passed to the thread pool. This collection method was very simple; every time
Granted this is only one day of runtime on one node for each cluster, but I think it is telling. The data on testnet is noisier, but on mainnet:
|
Aside from gut feeling, this initial datapoint also suggests that the thread was over-utilized.
The numbers show that there is a pretty steep drop-off. I probably lost some precision, but the first 28 threads are doing 99.94% of the work. The first 24 99.58%, and the first 16 95.23%. These points also seem to suggest that the extra threads are rarely getting utilized and adding extra overhead for little to no gain (and potentially doing more harm than good when considering the general overhead) |
91aae97
to
3f9a7a5
Compare
8eeb91e
to
8dac066
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #25 +/- ##
=======================================
Coverage 81.8% 81.8%
=======================================
Files 841 841
Lines 228307 228307
=======================================
+ Hits 186941 186973 +32
+ Misses 41366 41334 -32 |
0f88a6c
to
50dcfa8
Compare
For the sake of experimenting, I single-threaded tx replay with the included patch. The node was unable to catchup; this is somewhat expected from looking at the following two metrics:
Mainnet metrics for my test nodes show show an average of ~450-500ms for |
50dcfa8
to
cd5eefc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems ok, why is not merged yet?
Enough time has passed - going to close + re-open this PR |
Problem
The thread pool that is used to perform entry(transaction verification) and transaction execution is currently set to have the same size as the number of virtual cores on a machine. For example, a 24 core / 48 thread machine will put 48 threads into this pool.
This thread-pool is over-provisioned, and the extra thread actually cause more harm than good. When work is sent to the pool, all thread are woken up, even if there is only work for one or two threads. This "thundering herd" effect causes lots of general system disruption, and can easily be mitigated by bounding the thread pool size to more accurately fit the workload we throw at it.
Part of work for #35
Summary of Changes