-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SchedulingStateMachine for unified scheduler #129
Introduce SchedulingStateMachine for unified scheduler #129
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #129 +/- ##
========================================
Coverage 81.9% 81.9%
========================================
Files 837 837
Lines 226823 227455 +632
========================================
+ Hits 185828 186415 +587
- Misses 40995 41040 +45 |
I've just left my review on solana-labs#35286 (review) which is what I was tagged on. Hopefully this PR matches the same code 😅 |
I'll keep them in-sync. :) fyi, you can confirm my diligence by this:
|
86b27ce
to
9e07dcd
Compare
/// | ||
/// Note that lifetime of the acquired reference is still restricted to 'self, not | ||
/// 'token, in order to avoid use-after-free undefined behaviors. | ||
pub(super) fn with_borrow_mut<R>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might be missing something, but what prevents this
struct FakeQueue {
v: Vec<u8>,
}
let queue = Arc::new(TokenCell::new(FakeQueue { v: Vec::new() }));
let q1 = Arc::clone(&queue);
std::thread::spawn(move || {
let mut token = unsafe { Token::<FakeQueue>::assume_exclusive_mutating_thread() };
q1.with_borrow_mut(&mut token, |queue| {
// this is UB
let v = &mut queue.v;
});
});
std::thread::spawn(move || {
let mut token = unsafe { Token::<FakeQueue>::assume_exclusive_mutating_thread() };
queue.with_borrow_mut(&mut token, |queue| {
// this is UB
let v = &mut queue.v;
});
});
There's one token per thread, so I'm respecting the Token::assume_blah()
contract. I have a token, so I'm respecting the TokenCell::with_borrow_mut contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, the above example is violating this #129 (comment):
/// By extension, it's allowed to create _multiple_ tokens in a _single_ process as long as no
/// instance of [`TokenCell`] is shared by multiple instances of [`Token`].
i.e. the Arc
-ed TokenCell
is shared by the token
s in the two threads.
as a quick recap, the objective of this fancy TokenCell
is to offload creation of them to multiple threads. those threads doesn't need mutable references, so no tokens needed for them. on the other hand, the scheduler thread needs mutability. and it's the sole thread to mutate all of those objects created ever. in this way, i want to avoid Mutex
-ing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-read that comment a few times and I still don't understand it to mean what you suggest it should mean. I can't create multiple Token's in the same thread because the constructor asserts that. And the Arc is shared among threads the same way UsageQueue is shared among threads. I understand what you're saying, but from the API and docs you're providing me, I don't think I'm doing anything wrong by sharing the TokenCell among threads, I thought that was the whole point.
as a quick recap, the objective of this fancy TokenCell is to offload creation of them to multiple threads. those threads doesn't need mutable references, so no tokens needed for them. on the other hand, the scheduler thread needs mutability. and it's the sole thread to mutate all of those objects created ever. in this way, i want to avoid Mutex-ing things.
Ok, but now you've created TokenCell, which claims to be Send and Sync, but it's in fact not Sync. The documentation claims that the associated Token API is what ensures that there's ever only one live mutable access to the inner value, when it doesn't. In fact as far as I can tell, the token just acts as an inconvenience/deterrent within one thread, so callers are somehow alerted that something dangerous is happening. But then the same could be achieved without a token at all, just by calling the accessors borrow_but_this_is_really_dangerous() borrow_mut_this_is_even_more_dangerous().
I don't know enough about how these UsageQueues are being instantiated, used etc (feel free to link to code if you want to), but my feeling is that we should be using a Mutex, wrapped in a wrapper that always uses try_lock and asserts that the Mutex is in fact never contended. That way you create a Sync API that is 100% safe, at the cost of a compare and swap, and - again without currently knowing much about usage patterns - I'd bet 1 BONK that it wouldn't have a significant performance impact.
Alternatively if you don't want to pay the price of CAS at all, I think you should look into moving these UsageQueues by value so that they don't need to be Sync at all. IMO creating an API which looks and sounds safe to be used by multiple threads, but in fact very easily introduces UB when used by multiple threads unless "held correctly", is not something we should do, especially in new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given hyper-sensitivity of scheduler it might be that the Mutex::try_lock CAS does have an impact - but would be good to measure that impact if we can avoid the unsafe here.
I think the ultimate usage of the unsafe-ness works as currently written, because we only allow token access through &mut self fns on the SchedulingStateMachine. But introducing something that other devs are confused about makes this more challenging to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alessandrod and @apfitzge: really thanks for spending time on reviewing this.
I've re-read that comment a few times and I still don't understand it to mean what you suggest it should mean. I can't create multiple Token's in the same thread because the constructor asserts that. And the Arc is shared among threads the same way UsageQueue is shared among threads. I understand what you're saying, but from the API and docs you're providing me, I don't think I'm doing anything wrong by sharing the TokenCell among threads, I thought that was the whole point.
okay, in this case, i think doc is worded badly. I'll try to rearrange it.
as a quick recap, the objective of this fancy TokenCell is to offload creation of them to multiple threads. those threads doesn't need mutable references, so no tokens needed for them. on the other hand, the scheduler thread needs mutability. and it's the sole thread to mutate all of those objects created ever. in this way, i want to avoid Mutex-ing things.
Ok, but now you've created TokenCell, which claims to be Send and Sync, but it's in fact not Sync. The documentation claims that the associated Token API is what ensures that there's ever only one live mutable access to the inner value, when it doesn't.
as long as the (poorly-written) user contract is properly adhered, there should only be one.
In fact as far as I can tell, the token just acts as an inconvenience/deterrent within one thread, so callers are somehow alerted that something dangerous is happening. But then the same could be achieved without a token at all, just by calling the accessors borrow_but_this_is_really_dangerous() borrow_mut_this_is_even_more_dangerous().
yep. it's just deterrent. however, consider this:
this would pass compilation. yet this is ub because of rust's aliasing rule violation:
// assume arced_token_cell111 and arced_token_cell222 could point to same instance.
let a = arced_token_cell111.borrow_mut_this_is_even_more_dangerous();
for ... {
let b = arced_token_cell222.borrow_mut_this_is_even_more_dangerous();
}
but, the inconvenient TokenCell
can prevent this from happening accidentally. in fact, i did so while coding. that's why i introduced this because it has no runtime cost.
I don't know enough about how these UsageQueues are being instantiated, used etc (feel free to link to code if you want to), but my feeling is that we should be using a Mutex, wrapped in a wrapper that always uses try_lock and asserts that the Mutex is in fact never contended. That way you create a Sync API that is 100% safe, at the cost of a compare and swap, and - again without currently knowing much about usage patterns - I'd bet 1 BONK that it wouldn't have a significant performance impact.
sadly, even a single cas will have a significant performance impact: solana-labs#35286 (comment)
this piece of code must wade through 1 million transactions or more per sec for line-rate at the very least to find most-paying transactions, when it is sited at the front of the banking stage (in the future). And more throughput is better. and there's no enough. ;)
Alternatively if you don't want to pay the price of CAS at all, I think you should look into moving these UsageQueues by value so that they don't need to be Sync at all. IMO creating an API which looks and sounds safe to be used by multiple threads, but in fact very easily introduces UB when used by multiple threads unless "held correctly", is not something we should do, especially in new code.
UsasaQueue::blocked_usages
must be some kind of a collection. Currently, vanilla VecDequeue
. so, move by value isn't a viable option, either...
Given hyper-sensitivity of scheduler it might be that the Mutex::try_lock CAS does have an impact - but would be good to measure that impact if we can avoid the unsafe here.
the non-trivial overhead is measured here as a cas in Arc, (basically same): solana-labs#35286 (comment)
I think the ultimate usage of the unsafe-ness works as currently written, because we only allow token access through &mut self fns on the SchedulingStateMachine. But introducing something that other devs are confused about makes this more challenging to maintain.
i admit this is confusing as hell. but i want to ship it for the merits. I'll strive TokenCell
to be documented clearly and it's not general purpose library. its use is very local to the unified scheduler code only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VecDeque is actually pretty fast to move? In fact, doesn't this whole problem go away if instead of Arcs to share between other threads and the state machine you move Boxes back and forth?
/// By extension, it's allowed to create _multiple_ tokens in a _single_ process as long as no | ||
/// instance of [`TokenCell`] is shared by multiple instances of [`Token`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
What is the advantage of having UsageQueueLoader outside SchedulingStateMachine? In the code that exists right now UsageQueueLoader is a dashmap, which is a type with horrible performance characteristics. It seems to me that if you move it inside SchedulingStateMachine then it can be a regular hash map, tokencell isn't needed, and everything gets faster? |
you mean complaints like this? solana-labs#34955 i haven't evaluated
this is correct so far... yet...
this isn't. I don't think a single-threaded vanilla HashMap can compete with 20-threaded DashMap-ed loader. The multi-threaded loader only incurs overhead of 1 On the other hand, a tx with 100 accounts needs 100 As for the need of this much of low latency, i documented here: https://github.com/solana-labs/solana/pull/34676/files#r1536585166 |
This analysis is missing the dashmap overhead, which is pretty terrible. Dashmap partitions the key space into shards, then shards are protected with rwlocks. If you look at dashmap usage in the validator now, it's a futex sleeping fest. Have you explored moving things (eg Box) instead of using Arc and inner mutability? Why didn't that work? EDIT: Box doesn't work, because you do want to share outside the state machine. I'll try to play around with the code and get a better feel for the API. |
as you said dashmap looks not promising. I wonder why this crate is used in the validator in the first place. ;) I went to crate shopping. how about leapfrog? the bench number seems to be promising. https://github.com/robclu/conc-map-bench?tab=readme-ov-file#read-heavy-std-hasher.
oh, really thanks for the effort. |
fyi, i've run the popular bench ( also, note that ReadHeavy is appropriate bench scenario for |
8c0c6bd
to
6157e84
Compare
btw, I made |
@alessandrod How's going with this? I wonder i can merge the pr at least for now to unblock my work... i know there's some disagreement about the general design and justification of (sorry, i have to ping you here from now on, because i got inquiries from multiple people about the progress of this pr...) |
Given miri UB detection, I think I'm fine to merge this if @alessandrod is as well. I was happy with the other parts of implementation last I reviewed, and mainly had concerns about the |
7b2dd3a
to
29066e8
Compare
Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
29066e8
to
b851631
Compare
b851631
to
f681fde
Compare
Sorry been looking into quic congestion issues this week. Yeah sure I have no problem to merge as is if it's blocking you. |
bf19014
to
ffa1857
Compare
thanks for the reply and jumping on code-review of unfamiliar code. your insightful review comments made this pr better in various ways.
@apfitzge thanks for positive feedback. it seems the time has finally come. :) I'm in the ship-it mood very much.. lol could you review the new commits since you reviewed, starting from 9eed18b if you haven't? btw, I just added negative test cases for miri: f681fde, ffa1857 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at commits since my last review. Looks good, like the changes you and @alessandrod made!
Migrated solana-labs#35286.
(let's finish up the review at the original pr; then just r+ on this pr; sans unfortunate rebase due to ci; this branch will be sync with the original branch)
perf numbers
before:
after: