-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Introduce SchedulingStateMachine for unified scheduler [NO-MERGE&REVIEW-ONLY-MODE] #35286
Conversation
3a9a5b4
to
51d4dc8
Compare
2f92ee2
to
730afef
Compare
@@ -1023,7 +1062,7 @@ mod tests { | |||
.result, | |||
Ok(_) | |||
); | |||
scheduler.schedule_execution(&(good_tx_after_bad_tx, 0)); | |||
scheduler.schedule_execution(&(good_tx_after_bad_tx, 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is due to task_index is started to be assert!()
-ed....
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #35286 +/- ##
=========================================
- Coverage 81.8% 81.6% -0.2%
=========================================
Files 837 834 -3
Lines 225922 225589 -333
=========================================
- Hits 184897 184206 -691
- Misses 41025 41383 +358 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial scan through the code, appreciate the well thought out comments! I will have to give this another pass, as well as getting through the tests.
unified-scheduler-logic/src/lib.rs
Outdated
fn default() -> Self { | ||
Self { | ||
usage: PageUsage::default(), | ||
blocked_tasks: VecDeque::with_capacity(1024), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how you chose this number? At first glance, it seems quite high as the default allocation for number of blocked-tasks; at least for current blocks on MNB we have maybe several hundred non-vote transactions. So this seems like a huge overkill to me since for replay at max we probably have a couple hundred blocked tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. i temporarily reduced it and document about why: 5887a91
@apfitzge thanks for reviewing. I'll later address them. I've added benchmark steps in the pr description. also, I'll share some on-cpu flame-graph by the scheduler thread of rebase prthe schedule thread of this prsome random remarks
|
I think I finally have a reasonably good understanding of how your scheduling works (much easier with trimmed down PR! ❤️ ). Similarities
Differences
ExampleThink visual examples always serve better than words here. Let's take a really simple example with 3 conflicting write transactions. Prio-graph will effectively store a linked list, edges from 1 to 2 to 3. graph LR;
Tx1 --> Tx2 --> Tx3
US will initially store access of Tx1, and list w/ order (Tx2, Tx3). Only when Tx1 is descheduled will the conflict between 2 and 3 be "realized". graph LR;
Tx1 --> Tx2["Tx2(0, w)"] & Tx3["Tx3(1, w)"];
ClosingNot sure anything I commented here has any actual value, just wanted to share my thoughts that these 2 approaches are remarkably similar but different in their implmentation details due to different uses (stateful vs lazy). edit: I actually made a branch in prio-graph to use this AccessList (Page?) pattern instead of directly tracking edges. Interestingly I see cases where the old approach does significantly better, but also cases where an approach similiar to yours does significantly better - interesting to see criterion regressions of 500% and also improvements of 50% in same bench run 😆 I am somewhat curious to try plugging in prio-graph to your benches to see how it compares for exactly the same benches you have. Do you have any benches on just the scheduler logic? |
thanks for great comparison write-up. I'll check out prio-graph impl later with those good guides in mind. :) (EDIT: I wrote my thoughts here: #35286 (comment))
yeah: #33070 (comment) my dev box:
this is processing 60000 txes with 100 accounts (and 50% conflicts). 970ns per this arb-like (= heavy) tx. and this is the basis for 100ns in the doc comment (assuming normal tx should touch ~10 accounts; 1/10 of benched tx). that said, I have very dirty changes (ryoqun#17 (comment)) with this results:
That's ~1.7x faster. and |
unified-scheduler-logic/src/lib.rs
Outdated
/// Closure (`page_loader`) is used to delegate the (possibly multi-threaded) | ||
/// implementation of [`Page`] look-up by [`pubkey`](Pubkey) to callers. It's the caller's | ||
/// responsibility to ensure the same instance is returned from the closure, given a particular | ||
/// pubkey. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably missing some context, but I am not following the purpose of this closure, or why the AddressBook
is not owned by the internal scheduler.
When would this ever do something other than just loading from AddressBook
, which can already be done in a multi-threaded way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe, nice point. hope this clarifies why i don't add dashmap
to solana-unified-scheduler-logic
's Cargo.toml
...: 821dee2
1d99c44
to
83e514a
Compare
sans the big rename Page => UsageQueue and my take on comparison with prio-graph, i think i should have addressed all review comments so far. so, I'm requesting another review round. not sure you did go in depth for its algo code and my |
unified-scheduler-logic/src/lib.rs
Outdated
/// Scheduler's internal data for each address ([`Pubkey`](`solana_sdk::pubkey::Pubkey`)). Very | ||
/// opaque wrapper type; no methods just with [`::clone()`](Clone::clone) and | ||
/// [`::default()`](Default::default). | ||
#[derive(Debug, Clone, Default)] | ||
pub struct Page(Arc<TokenCell<PageInner>>); | ||
const_assert_eq!(mem::size_of::<Page>(), 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arc has a runtime cost for cloning so I'm actually a bit surprised to see it used here for such a critical item - though perhaps it allows a simpler implementation. Do you know how much of your insertion time is spent doing Arc-clone? I imagine its' small relative to the Pubkey hashing?
Still, I wonder if you have thought about or have maybe even tested using an indexable storage for the inner pages, and then using index-references as the stored kind on the tasks, to avoid Arc-cloning.
To be clear I'm saying some sort of set up like:
pubkey_to_index: DashMap<Pubkey, usize>, // not sure if dashmap or locked hashmap? Just some translation that maps Pubkey -> index.
inner_pages: Vec<Option<InnerPage>>, // use a fixed-capacity vector (or slice) so we never add more pages past some pre-determined limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to block on this, just sharing my thought on perf here. It's almost certainly a micro-optimization; but in this sensitive code, any small bit can help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arc has a runtime cost for cloning so I'm actually a bit surprised to see it used here for such a critical item - though perhaps it allows a simpler implementation.
Yeah, I used Arc
here for simplicity and to ship this pr into master asap (just surpassed the 1.5k loc... lol).
Do you know how much of your insertion time is spent doing Arc-clone?
not directly. but, I can indirectly guess it'll take 2-3ns per Arc::clone()
when the instance isn't contended from an unrelated bench.
I imagine its' small relative to the Pubkey hashing?
I haven't tested. but I presume yes. Even more, I'm planning to switch to Rc
from Arc
. in that case, it's definitely smaller than the hashing.
As I indirectly indicated earlier (#35286 (comment) and #35286 (comment)), that optimization will allow us to trim down to merely a single inc
(without the lock
prefix) in x86-64 asm terms. i.e. possibly auto-vectorizable plain old 1 ALU instruction. And, mem access is 1 pop with no branching. And there's no work to need to be done for insertion into the scheduling state machine as it's not relevant. If you're referring to the insertions happening with the batched cloning in the case of being blocked in try_lock_for_task()
, index-references or Rc
will need 1 mem copy of 1 word-size for each address no matter what.
Still, I wonder if you have thought about or have maybe even tested using an indexable storage for the inner pages, and then using index-references as the stored kind on the tasks, to avoid Arc-cloning.
To be clear I'm saying some sort of set up like:
...
Yeah, i've thought about it. But, I concluded this pr's approach should be faster with gut sense. the index approach introduces another pointer dereference indirection, which i'd like to avoid. However, not benchmarked it.
I don't want to block on this, just sharing my thought on perf here. It's almost certainly a micro-optimization; but in this sensitive code, any small bit can help!
Thanks for sharing your thoughts. happy to share mine. I hope I've squeezed as much as possible.
I was still going to do another round with deep dive. With these big ones I always try to read comments and get high-level overview, then come back for a deeper dive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another short round of comments - I still need to take another round to go through the tests.
unified-scheduler-pool/src/lib.rs
Outdated
let result_with_timings = result_with_timings.as_mut().unwrap(); | ||
Self::accumulate_result_with_timings(result_with_timings, executed_task); | ||
}, | ||
recv(dummy_receiver(state_machine.has_unblocked_task())) -> dummy => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dummy receiver pattern seems really odd to me, though I see how it is working (mostly? need to cargo expand since not 100% sure WHEN state_machine,has_unblocked_task()
is resolved).
edit:
So the has_unblocked_task is resolved before receiver selection, which is what I was guessing. Even if we stick with this dummy-receiver pattern, I'd actually much prefer we make this very clear since I don't think it's a very obvious thing:
let dummy_receiver = dummy_receiver(state_machine.has_unblocked_task());
select!{
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this in mind, I'm trying to figure out when we could have an unblocked task. We can only have an unblocked task if we just received a finished task, right?
Since on receiving new tasks, we'll immediately attempt to schedule and will schedule it if it is unblocked, that seems to be the case to me.
Why even have this recv
for the never case that there is no unblocked task? That's just a wasted select
, right? Let's say both our real channels have actual important data - a finished task and a new task. This macro could randomly (because current impl is always random) select this dummy receiver, which will just do nothing useful. At least my initial intuition is that this is wasted time, lmk what you think.
I know you had made a branch for a biased select - is the order here i.e. finished -> dummy -> new
the biased order you want?
It could be overkill for the current MNB load we have, but I wonder if you considered just running a hot loop here if we are actively scheduling; not sure if using try_recvs in hot loop would cause more contention on the channels, you're certainly more familiar with crossbeam internals.
But basically I am asking about something along the lines of:
// Blocking recv for an OpenChannel to begin our scheduling session - no other messages can come in until then.
let message = new_task_receiver.recv();
{
// Code for initialization, checking it's actually OpenSubChannel.
}
while !is_finished {
if let Ok(executed_task) = finished_task_receiver.try_recv() {
// Handle the executed task.
continue; // Move to next iteration without checking other channels.
}
if state_machine.has_unblocked_task() {
// Handle unblocked task.
continue; // Move to next iteration without checking for new task.
}
if let Ok(message) = new_task_receiver.try_recv() {
// Handle new task message.
}
}
EDIT: I am an idiot - for some reason I thought never
was always just disconnected...not it could never be selected. Anyway, my initial thought on the oddity of the pattern still stands. So not going to delete my comment, and still request your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dummy receiver pattern seems really odd to me, though I see how it is working
.........
Anyway, my initial thought on the oddity of the pattern still stands. So not going to delete my comment, and still request your thoughts.
Hope I explained the justification clearly: 89b7773. I think you can get used to it. this pattern is simplest and no redundant code and most performant. :)
I'd actually much prefer we make this very clear since I don't think it's a very obvious thing:
let dummy_receiver = dummy_receiver(state_machine.has_unblocked_task()); select!{ // ... }
This is done in the commit along with a comment. As I ranted in the comment, ideally, I want to move the dummy_receiver()
invocation back into the select!
...
I know you had made a branch for a biased select - is the order here i.e.
finished -> dummy -> new
the biased order you want?
Yes, this is correct.
It could be overkill for the current MNB load we have, but I wonder if you considered just running a hot loop here if we are actively scheduling; not sure if using try_recvs in hot loop would cause more contention on the channels, you're certainly more familiar with crossbeam internals.
As I explicit commented on in the commit 89b7773. it's too early to take the busy loop path and i haven't fully grokked the perf implication of busy looping. Casual tests showed an improved latency indeed, though.
But basically I am asking about something along the lines of:
...
By the way, busy looping can be done with select{,_biased}!
pretty easily like this:
loop {
select{_biased}! {
recv(...) => ....,
...,
default => continue, // there's special-cased the `default` selector in `select{,_biased}!`
}
...
if is_finished {
break;
}
}
I'm hesitant with manual busy (or non-busy) select construction, considering I'm planning to adding new channel specially for blocked tasks to fast-lane to process them.
)) | ||
.unwrap(); | ||
session_ending = false; | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never exit this loop except for panic? I think we may have already talked about this in a previous PR, and you're just planning on error-handling in a follow-up. Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never exit this loop except for panic? I think we may have already talked about this in a previous PR, and you're just planning on error-handling in a follow-up.
This understanding is completely aligns with what i said previously. The proper scheduler management code is complicated, flavored to my own taste (expecting back-and-force review session), (i.e. yet another beast by itself). so, i wanted to focus on the logic with this pr.
/// behavior when used with [`Token`]. | ||
#[must_use] | ||
pub(super) unsafe fn assume_exclusive_mutating_thread() -> Self { | ||
thread_local! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a summary, I think Token
and TokenCell
were introduced because we have data within an Arc
that we want to mutate, and these were added to give some additional safety constraints.
As far as I can tell, right now the mutations occur only in the scheduling thread so this seems fine. These token cells do not protect us against multiple threads accessing the same data if it were called from multiple threads - which requires some care from developers here. I am hopeful the naming of this function and unsafeness is probably sufficient warning!
WRT the ShortCounter Token
-gating; the reason we can't or do not want to use an AtomicU32
is simply because we want to have checked operations to handle overflow explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT the ShortCounter
Token
-gating; the reason we can't or do not want to use anAtomicU32
is simply because we want to have checked operations to handle overflow explicitly?
AtomicU32
is slow, because it always emits atomic operations, necessitating the given cpu core to go to the global bus ;) by the way, std::cell::Cell<ShortCounter>
can be used without runtime cost here, thanks to ShortCounter: Copy
. But i wanted to dog-food TokenCell
and to use TokenCell
's more concise api than Cell
, which is made possible due to its additional constraints than Cell
.
Also, note that blocked_page_count
will be removed for upcoming opt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a summary, ...
As far as I can tell, ...
also, this is correct. thanks for wrapping up your understanding.
@@ -7462,7 +7462,9 @@ dependencies = [ | |||
name = "solana-unified-scheduler-logic" | |||
version = "1.19.0" | |||
dependencies = [ | |||
"assert_matches", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, this will soon be not needed as par the progress at: rust-lang/rust#82775
I just finished the big renaming. still, i have yet to do the quoted homework of prio-graph. but, i think I've addressed all new review comments so far again. Always thanks for quite timely review! |
This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave |
f245874
to
4fcb360
Compare
unified-scheduler-logic/src/lib.rs
Outdated
/// Returns a mutable reference with its lifetime bound to the mutable reference of the | ||
/// given token. | ||
/// | ||
/// In this way, any additional reborrow can never happen at the same time across all | ||
/// instances of [`TokenCell<V>`] conceptually owned by the instance of [`Token<V>`] (a | ||
/// particular thread), unless previous borrow is released. After the release, the used | ||
/// singleton token should be free to be reused for reborrows. | ||
pub(super) fn borrow_mut<'t>(&self, _token: &'t mut Token<V>) -> &'t mut V { | ||
unsafe { &mut *self.0.get() } | ||
} | ||
} | ||
|
||
// Safety: Once after a (`Send`-able) `TokenCell` is transferred to a thread from other | ||
// threads, access to `TokenCell` is assumed to be only from the single thread by proper use of | ||
// Token. Thereby, implementing `Sync` can be thought as safe and doing so is needed for the | ||
// particular implementation pattern in the unified scheduler (multi-threaded off-loading). | ||
// | ||
// In other words, TokenCell is technically still `!Sync`. But there should be no | ||
// legalized usage which depends on real `Sync` to avoid undefined behaviors. | ||
unsafe impl<V> Sync for TokenCell<V> {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per #35286 (review):
However. I think we should get a 2nd opionion on all the unsafe code.
@alessandrod This mod contains bunch of unsafe
s and these highlighted two is the most important ones. I think I've extensively doc-ed it. Could you review whether my justification does make sense at all? I appointed you assuming you know Rust better than me from various memory mangling at vm. :) Happy to comment-in-source more to fill the inner workings of unified scheduler itself for general context. That said, feel free to defer to someone else. Thanks in advance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🫡
Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
thanks for in-depth look into tests. I've addressed all, requesting another review req. btw, please r+ on anza-xyz#129...
kk. I've done this as well: #35286 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished my first pass at this. I really like the general structure and direction! Being completely new to this new scheduler code I've struggled with some of the terminology and I've left some nits about that. Also see the TokenCell issue.
//! [`::schedule_task()`](SchedulingStateMachine::schedule_task) while maintaining the account | ||
//! readonly/writable lock rules. Those returned runnable tasks are guaranteed to be safe to | ||
//! execute in parallel. Lastly, `SchedulingStateMachine` should be notified about the completion | ||
//! of the exeuction via [`::deschedule_task()`](SchedulingStateMachine::deschedule_task), so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the caller doesn't call deschedule_task()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kitten dies. 😿 well, it's really bad condition, i haven't thought about in depth. it will easily lead to dead locks.
well, i wonder this edge case should be called out in this summary doc.. fyi, I explicit mentioned this case in the deschedule_task()
doc comment: 250edde
/// `solana-unified-scheduler-pool`. | ||
pub struct SchedulingStateMachine { | ||
unblocked_task_queue: VecDeque<Task>, | ||
active_task_count: ShortCounter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments on what these counters count would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit confusing that this is called active_task_count, because it also
includes blocked tasks, which are arguably inactive? Maybe current_task_count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe, i agree tastelessness of active
here. hmm, current_task_count
is a bit ambiguous. how about in_progress_task_count
? (or not_(yet)_handled_task_count
; don't like this much because of implication of being a counter of tasks which has been resolved not to process).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find in_progress as confusing, since it implies that something is progressing while blocked tasks technically aren't progressing. @apfitzge as the native speaker, wdyt would be the best term here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about pending_task_count
, or managed_task_count
, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or runnable_task_count
?
|
||
while let Some((requested_usage, task_with_unblocked_queue)) = unblocked_task_from_queue | ||
{ | ||
if let Some(task) = task_with_unblocked_queue.try_unblock(&mut self.count_token) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens to the task when try_unblock returns None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another nice question. hope this helps your understanding: ee5c8b6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how this works? usage_queue.unlock() pops the task from the usage queue. task.try_unblock() consumes self and returns None. How does the task reappear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's cloned here #35286 (comment) to all of blocked usage_queue
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This is why I dislike hiding Arcs with type aliases :P
unified-scheduler-logic/src/lib.rs
Outdated
/// instances of [`TokenCell<V>`] conceptually owned by the instance of [`Token<V>`] (a | ||
/// particular thread), unless previous borrow is released. After the release, the used | ||
/// singleton token should be free to be reused for reborrows. | ||
pub(super) fn borrow_mut<'t>(&self, _token: &'t mut Token<V>) -> &'t mut V { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the intent, but this is akin to a transmute and can be made
segfault trivially
struct T {
v: Vec<u8>,
};
let mut token = unsafe { Token::<T>::assume_exclusive_mutating_thread() };
let c = {
let cell = TokenCell::new(T { v: vec![42] });
cell.borrow_mut(&mut token)
};
c.v.push(43);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alessandrod oh, quite nice catch.. hope this fixes segfault and no more ub?: 001b10e (EDIT: force-pushed with updated fix: 02567b1)
seems i'm too short-sighted only with aliasing-based ub.. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uggg, actually, i need with_borrow_mut()
.. I'll soon push this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uggg, actually, i need
with_borrow_mut()
.. I'll soon push this.
done: 02567b1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave
❤️
these feedback is quite appreciated. I'll address in turn to fill the missing contexts.
as this is the most important review comment, I've address it firstly, and requested review for it... |
gg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave
Problem
The unified scheduler is still single-threaded, because the most important piece of code is still missing: the scheduling code itself.
Summary of Changes
Implement it with cleanest way possible and most documented way possible. FINALLY, TIME HAS COME.
Note that there's more work to be done for general availability of unified scheduler after this pr, but it is arguably the most important PR.
numbers
steps
replaying-bench-mainnet-beta-2024-02.tar.zst
: https://drive.google.com/file/d/1Jc1cd3pHKkaaT1yJrGeBAjBMJpkKS0hc/view?usp=sharingsolana-ledger-tool --ledger ./path/to/replaying-bench-mainnet-beta-2024-02 verify ...
(A)
--block-verification-method blockstore-processor
with this pr (reference):(B)
--block-verification-method unified-scheduler
with this pr:~1.3x faster than (A):
(C) (fyi) #33070 (all extra opt is applied):
~1.8x faster than (A), ~1.3x faster than (B):
context
extracted from #33070