-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async functions and blocking I/O don't mix #58
Comments
Hmm.
I agree with the general sentiment.
Yeah, there are some operations that either simply don't exist in both styles, or are much more natural to do in blocking or non-blocking style. Another example is the code in I think it's inevitable that we will have both sync and async code, but we need to have some kind of principle on where to use which, what the rules are in each component, and be careful at the boundaries.
Things get hairy at the boundaries. Like when a
Yeah. I would like that, I find explicit threads much easier to wrap my head around. And it's easier to debug with 'gdb'. However, I'm sure there will be exceptions where async would still be needed or preferred. I think most of the trouble is coming from mixing the async code in |
Hmm, that sounds wrong to me. Maybe I'm just misunderstanding you... Doing I/O under a * A problem if those other threads are also operated by the async executor, like if you're using the tokio multithreaded runtime. |
Some related discussions that might be worth reading: The Golden Rule - Don’t Block the Event Loop Tools to help prevent blocking in an async context don't seem to exist yet, though the rust async-foundations working group seems interested in improving that. Rust's async only became stable a bit over a year ago and Tokio 1.0 only arrived in December, so it's still early days for a lot of projects and users. |
I did a quick survey of messaging channels, to see if any of them can interoperate between sync and async code. There are quite a few differences...
Conclusion: if you need to do messaging between blocking & sync threads, use |
In #60, which I just created, I refactored the communication between |
Let me share my point of view. First of all I have to say that I am dummy in Rust and actually this is first I am writing code using async/await. But certainly I familiar with idea of cooperative multitasking and have even implemented my own portable cooperative multitasking library based on setjmp/longjmp for C++ about 30 years ago:) I always consider async model present now in most of modern programming languages as some step backward. But this is just "philosophic" talks. Lets return back to our Zenith problems. And concentrate concretely about pageserver. In case of Rust we have to use Tokio if our server has to serve thousands of connections. Handling each connection in separate thread is too costly. If page server only receives WAL streamfrom computation master nodes or wal_acceptors, then number of connections is not expected to be large. Even in multi-tenancy model, it is unlikely that one page server will server thousands of different databases. Most likely there will be 10-100 databases per pageserver. And normal (OS) threads will work perfectly in this case. But unfortunately pageserver is not only receiving WAL. It's main goal is to serve pages to computation nodes. And number of such connections is proportional to number of backends=users. So it can be very large (thousands). Looks like we can not live without async in this case? I am not sure... Maintaining large number of connections between clients backends and pageserver is bad thing in any case, doesn't matter how them are handled by pageserver. We can multiplex them at computation node side and have only one connection from computation node to page server. As far as I understand the main motivation for using tokio in pageserver was that Postgres rust API is also using async model. Certainly it is better to use existed components rather than reinvent the wheel, but I am not sure that in this case price is justified... From my point of view mixing async model with threads is very bad and error prone idea (although Stas doesn't think so). It is like mixing signals and threads in Unix applications. I prefer that our code will either use native threads, either tokio tasks (with pool of OS threads). But I am not sure that it is possible. My current experiments with Zenith perfoamance makes me think that pageserver will not be a bottleneck. As vanilla Postgres, speed of typical OLTP workload will be limited by fsyncs, i.e. wal_acceptors. Postgres is able to flush changes of multiple transaction using one fsync which allows to achieve better throughput if there are many concurrent transactions. Another way to improve performance is to enable asynchronous commit. In this case we can loose some number of last transactions (confirmed to the users) but speed is significantly increased. Unfortunately we can not use asynchronous commit in Zenith, because in case of synchronous replication transaction may be blocked for a long time. So performance with small number of clients will be awful. And if there are large number of active clients, then... transaction WAL writes will be efficiently combined even without async commit... Another interesting question is whether we can use asynchronous application if user is ready to loose last N transactions... So from my point of view the key point to improve Zenith performance will be to somehow increase WAL writing speed. Either by sharding WAL (in this case writing WAL will be performed in parallel), either by more efficient grouping of flushed WAL records. But it seems to be too far from out async discussion... Concerning advantages of AIO model, especially new Linux io_uring API - I do not have some opinion and experience now. There were some attempts to add AIO support to Postgres but as far as I know without any essential success. I didn't heard about some benchmarks with demonstrate advantages of AIO on typical database workload. Moreover, right now most of database fit in memory, so IO speed is mostly important for writing WAL (it may be not true for cloud databases). Also modern storage devices (like PMEM) are no more block oriented. So classical IO model based on buffers/pages seems to be obsolete and inefficient for such hardware. In any case, I think that it is too early to discuss such aspects (AIO vs. sync IO) at this stage of the project. |
So, we have basically two options: 1. Switch to blocking modelChange all current code that use async to blocking style, with the following exceptions:
2. Switch to async everywhereRemove the current calls that create explicit threads, replacing them with tokio::spawn instead. Have just one Tokio runtime. Be careful to not use blocking calls anywhere, or if you must, use spawn_blocking(). My opinionIn general, I like blocking style more. Async is just harder. It's harder to debug, and you have to be careful to not call functions that block, or you get weird deadlock or starvation issues. You can run into deadlock issues with explicit threads, too, but it's easier to reason about. We might need to use async for performance reasons. In two places in particular: 1. in page_service, to serve a large number of connections. Having one thread for each connection will eat a lot of resources, if most of them just sit idle. 2. If we have a lot of I/Os in flight. However, we're nowhere near the scale where that matters. Right now, I think explicit threads would be better. Maybe we'll need to optimize later, if/when those things become a bottleneck. How hard is it to change the model, or introduce async to the performance-critical parts later? Would it be easier to do everything in async fashion from the start, even if it means harder-to-debug code and slows us down somewhat, or use explicit threads now and rewrite later if required? I'm not sure. My gut feeling is to use explicit threads, but I don't feel strongly about it. |
I agree with that sentiment. Async puts the burden on the application developer. If threads were free, no one would use async. (Well, maybe there are exceptions where it really is a nicer model, like in
In a multi-tenant environment, we will put enough load on each page server, until it becomes the bottleneck. If a page server has free capacity, we will scale down and merge it with another page server, so that all page servers are busy again (with some margin for sudden spikes). So it's not really a question of whether the page server is a bottleneck, but rather how many page servers we will need to deploy. (Although it's possible that the page server software is efficient enough that it's bottlenecked by the page server's I/O hardware.) |
Remove 'async' usage a much as feasible. Async code is harder to debug, and mixing async and non-async code is a recipe for confusion and bugs. There are a couple of exceptions: - The code in walredo.rs, which needs to read and write to the child process simultaneously, still uses async. It's more convenient there. The 'async' usage is carefully limited to just the functions that communicate with the child process. - Code in walreceiver.rs that uses tokio-postgres to do streaming replication. We have to use async there, because tokio-postgres is async. Most rust-postgres functionality has non-async wrappers, but not the new replication client code. The async usage is very limited here, too: we use just block_on to call the tokio-postgres functions. The code in 'page_service.rs' now launches a dedicated thread for each connection. This replaces tokio::sync::watch::channel with std::sync:mpsc in 'seqwait.rs', to make that non-async. It's not a drop-in replacement, though: std::sync::mpsc doesn't support multiple consumers, so we cannot share a channel between multiple waiters. So this removes the code to check if an existing channel can be reused, and creates a new one for each waiter. That created another problem: BTreeMap cannot hold duplicates, so I replaced that with BinaryHeap. Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo manager and PageCache are replaced with std::sync::mpsc. (There is no separate 'oneshot' channel in the standard library.) Fixes github issue #58, and coincidentally also issue #66.
This was addressed by commit 3b9e7fc. I'm sure we will bump into async vs sync issues again in the future, and perhaps we should also change walkeeper to use explicit threads for consistency. But I consider this as fixed in the pageserver for now. If there are any further issues related to this, let's open new tickets for them. |
The pageserver code seems to freely mix async functions and blocking I/O. I first noticed this in the rocksdb change (#54), but I don't have to look far to find other examples:
restore_relfile
andwrite_wal_file
(#43) both do blocking I/O and are called from async functions.I think this is a big problem.
Mixing blocking and async code together just doesn't work. At its best, we're wasting effort in building async code, only to have the code unable to achieve any concurrency because there is I/O blocking the thread. At worst, the combined binary will have confusing stalls, deadlocks, and performance bugs that are hard to debug.
To make this code async-friendly, we would either need to push the I/O work to standalone threads, or use the async equivalents.
Async file I/O is not too hard: just swap in
tokio::fs
forstd::fs
and start propagating.await
andasync fn
as far as needed. But something like rocksdb is a lot harder: as far as I can tell, there's no such thing as an async rocksdb.Another alternative would be to give up on the async model, and just do everything with explicit threads.
The text was updated successfully, but these errors were encountered: