Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Barbara compares some cpp code (and has a performance problem) #144
Barbara compares some cpp code (and has a performance problem) #144
Changes from 1 commit
7906f87
16b484a
a040e02
23085f8
92cb3bd
34d0c75
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is still problematic, even with the
spawn
below, because you've got blocking work on a non-blocking task. Ideally, it would usespawn_blocking
to fix this:or, if the library was normally fast, but slow on some (big?) inputs you might prefer
block_in_place
(which ensures that other tasks run) and thespawn
solution you gave:Goes to show how hard this user story is to get right - the fix in this story still doesn't work well.
There's a fun sequence here where you add
block_in_place
and it's still bad, and then add thespawn
, which fixes it.One that's bitten us in Mononoke (issue #131 here) is that even if the long running work is also nicely async (i.e. just makes another network request), it can still go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about
block_in_place
, though I find it hard to reason about.I think in this case, onlyspawn_blocking
would work, but it has its own problems (can't be cancelled), (edit:block_in_place
+ myspawn
below, as you mentioned work as well, but in practice not having the block_in_place has not proven problematic for me, but perhaps I have even more perf improvements lurking) and like you mentioned, the cpu work is only expensive if the result is large, which is a runtime property. Not sure how we could detect that and do the correct thing. async-rs/async-std#631 looks like an attempt, but I need to read more into why it was closedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does
block_in_place do
? Isn't each tokio task already resilient to starving the others because they run on independent threads? Or is there some way the work-stealing part of it locks up?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each worker thread has its own (fixed-size) queue of tasks to service. There's a global queue for overflows, too, and a certain amount of rebalancing done by each worker thread to avoid starving the tasks in the global queue.
If a worker task goes idle, and the global queue is empty, then it will steal work from other workers.
block_in_place
ensures that the local queue is empty. As a result, it all depends on the balance between thread count (typically one per CPU hart) and runnable task count; if you have (e.g.) a huge AMD EPYC server with 256 harts (two threads and 128 cores in the system), and typically only 100 tasks to run, then you'll never see the difference betweenspawn_blocking
/block_in_place
andspawn
, because there will be idle workers that steal work from blocked workers.On the other hand, if you have a small system (say an older laptop) with 4 harts (two cores, two threads per core), and you have still have 100 tasks to run, without
block_in_place
you might see the runtime fail to migrate some of those tasks off the blocked worker and onto a running worker - the result is starvation for the tasks that don't get migrated to another worker thread.I'm guessing that you've never see the starvation issue because you normally have lots of worker threads, and not many runnable tasks at a time - as a result, a worker goes idle and steals work from the blocked thread. If you had small numbers of worker threads, but still lots of tasks, you'd see starvation more often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like it might make a nice addition to the story. e.g., Barbara is happy for a while, but then she sees starvation or something like that. That said, it could also be a good FAQ. I wouldn't change the main part of the story, though, since invoking
tokio::spawn
seems to be what "Barbara" actually did here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading this comment I was definitely struck by the challenge of "how to jigger this exactly right". I'm curious, @farnz, whether you think that a more adaptive runtime -- similar to what Go does, or what was proposed in this blog post, but never adopted -- would be a better fit for Mononoke?
One of the challenges here seems to be a matter of composition. The "right place" to insert
block_in_place
, for example, is inside ofreqwest
, at least if I understand correctly, but it may not be well positioned to judge whether messages are likely to be large or not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I guess I was wrong about that last bit. Still, it seems like these kinds of issues could frequently cross library boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh also I see you referenced async-rs/async-std#631 in the FAQ :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced by more adaptive runtimes - they tend to result in developers not realising when they've introduced accidental blocking, and I have a bias towards engineers knowing the cost of their decisions.
There are two orthogonal problems that I perceive here:
block_in_place
andspawn_blocking
- with room for semantic confusion. In my ideal world, there would only be one -block_in_place
- and something suitable done in the back-end so that it interacts nicely withspawn
, such thatspawn(async { block_in_place(|| thing()) })
is exactly the same as (or better than)spawn_blocking(|| thing())
today. That way, the guidance would beblock_in_place
for things that might block the CPU, andspawn
to get parallelism.For point 2 there,
tracing-futures
provides theInstrumented
wrapper which allows me to build a chunk of the tracing I'd want. Having the runtime automatically instrument everything so that I can see exactly when a future is polled (or not polled) and how long each poll takes would be better for my needs than an adaptive runtime; I could usetracing-flame
to both identify blocking (long polls), and starvation (long gaps between a future being ready to wake and being polled), and fix up both.The missing bit in
tracing-futures
is identifying when a Future is ready to be polled again - I'd love to see some sort of event tied to the future's span, so that I can see that (e.g.) it was ready to be polled again, but not polled for 10 seconds, or that it got polled again 2 µs after the waker was notified. I'd want to skip the first poll for this (the gap between creation and first poll) because that's a delay that I believe most people would predict in code review.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @farnz. Helpful.