-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and prevent blocking functions in async code #19
Comments
While, I think this idea has some merit to it. I don't think its the right path to take. It forces us to encode things like what is blocking, what is an async context when in reality we have no abstract way to really do this. To also note, the challenges of things like This is a big reason, why I think before we can move ahead with something like this we need to as a community define:
I think these questions are going to be quite hard to answer, since each runtime kinda does their own thing. I personally, think a better approach is to not try to tackle this from a language perspective but an education perspective. Why do users make this mistake? How can we teach them properly about blocking? Maybe that includes a very simple lint that links to docs? |
Thanks for the feedback! I'll clarify my perspective briefly, and leave the rest for further discussion.
Thanks, added to the list of ambiguous cases.
To be clear, I'm not suggesting type system changes. A lint was my initial thought too. Another idea is at-runtime executor and debugging tools that can report latency spikes.
Emphasis added to highlight the following point: Currently, users have to make this distinction on a case-by-case basis. Our (Fuchsia's) internal research shows that async alone adds significant complexity, even more so than e.g. lifetimes and the type system. Unlike those, making mistakes with blocking has an almost non-existent feedback system - no compiler errors, no test failures. In short, users are on their own.
Education is certainly part of the solution (it's something I'm actively working on), but I argue education and automated tooling are not mutually exclusive. Just like lifetimes are thoroughly covered in learning material and have highly actionable and precise compiler errors. |
Disk IO seems to be a clearly blocking operation. But... what if it's a read/write in somewhere like It probably is good to have tooling to catch potential issues like this, perhaps in something like clippy rather than the compiler. But it is fairly complicated to determine what should trigger the lint. |
I think this is pretty much guaranteed to be a lint, since it will be best effort: catching all cases of a blocking function being called would be impractical. The type system is not sufficient for tracking something like this, as @LucioFranco pointed out. As for what an "async context" is, I think the simplest answer is probably the correct one. I suggest we focus on calls that happen directly in an async fn, async block, and possibly the poll fn of a Future impl. This also means the lint would be significantly more useful with the ability annotate user functions with an attribute, like We could prototype something like this in clippy with a hardcoded list of functions to warn on. That would be useful, but I don't think we could implement something like the |
I think different programs (and libraries) will have different "degrees of caring," where for example I might care about disk I/O but not println. On the other hand, some latency-critical applications might care a lot about println. I think we may want individual lints for different categories, with a few "umbrella" lints you can allow/deny all at once.
If you want errors on disk I/O (you've enabled the |
This seems true only for as long as the alternative is harder to use. If the stdlib included both blocking and non-blocking println implementations, recommending to always use the non-blocking version would be less controversial. I'm not sure how much we'd gain by introducing multiple lints. An issue I've seen crop up several times with async-std users is confusion around using The following functionality I think would be important to apply the attribute on, both in the stdlib and as a recommendation for library authors:
Algorithms seem too hard to categorize broadly and should always be manually assessed. Overall I'm very much in favor of introducing such a lint; I think this will be incredibly helpful for people working in async Rust! |
I would argue that actually calling
Using non async locking is a totally normal thing to use in async applications but lets say for example I add async locking, then I add a My general worry is people leaning too heavily on this. I feel like its okay to lean heavily on must_use since that generally works quite well but in this situation it doesn't always work well. I personally think effort should go into documentation around this and infrastructure to help runtimes detect/expose slow futures that may be doing blocking. The lint to me personally feels a bit costly for the benefit, where as I think we could potentially invest in other areas with much better return.
I agree, though we do have limited time and resources. Maybe, its just me who doesn't see this being as useful. But I think I got that point across enough now. Happy to move forward with this if people generally think its useful. I personally plan on exploring how as a runtime we can expose these implicit relationships. |
Here's a relevant blog post by @Darksonn, which suggests a rule of thumb of spending no more than 10-100 usec between await points. (It also says that it depends on your application, but I think it's an interesting guideline nonetheless.) |
My thinking through this: if I were writing a tool to analyze my async code, I'd want to be able to do something along the lines of querying the entire function call tree for any call that could potentially take too long for its context, where the deadlines for a given context vary based on the executor in use. So, I'd want a way to run some sort of worst-case profiler - what are the worst plausible cases for the functions I ran? For some of these functions, such as reading from the filesystem or the network, the runtime be not be bounded by the rust code at all, where a potentially arbitrarily large amount of external work may need to occur. For those, I'd want to ensure they never get called in an async context, full stop, and I'd want that kind of function tagged with something like the name #[may_block] and then ideally scanned recursively through the call tree. For other functions, such as ones that do context-dependent amounts of computation inside the rust process, I'd want a way to characterize the timing variation of that function so that I can ensure it won't surprise me with long runtime. That could potentially get as complex as a static analysis which attempts to prove a bound on the function's runtime and then warn me if the bound is not reliably short enough or warns if it was unable to prove halting. However, that's proposing a rather large amount of work - work which would likely be worth it, especially with Z3 in the world to make it possible, but which I certainly don't have the available time to figure out and do at the moment.
This seems like it might support a case for providing a field in the attribute to specify the warning message. I agree that File::read() would usually be fine, but how fine it is varies based on read size and where you're reading from, and it's not hard to get yourself into a situation where it's not fine and async or threaded file io is needed (of course, last I checked, it's not trivial to set up a filesystem that will support async file io!). I think ultimately the sort of tool you need to solve this problem needs to be more powerful than local compiler lints, but it could go a long way to tag functions which are known to have high latency-worst-cases. Come to think of it, what if the lint allowed specifying estimated best case and worst case bounds? sketching an unreasonable extreme (I forget what the limitations of attribute syntax are - as flexible as a macro, iirc?): #[may_block(
milliseconds: "fastest I've ever seen it run",
unbounded: "may potentially take as much as 5 minutes if dumbsynclib's internal connection to irc times out")]
fn expensive_and_unpredictable() {}
#[may_block(microseconds: "simple cases", milliseconds: "bounded by max dependency list size")]
fn run_dependency_solver() {}
// or maybe only an upper bound:
#[may_block(seconds: "constant time cpu bound operation")]
fn calculate_password_hash() {}
#[may_block(milliseconds: "waits for gpu to catch up, guaranteed to be less than 30ms unless driver crashes")]
fn sync_gpu() {} then in fantasyland where we can add this complexity to allow(), in main.rs, specify |
I committed to @yoshuawuyts to add an RFC for this issue. Looking at it again, I would ideally to see this prototyped in Clippy beforehand, in order to get a feel for the workflow and build a stronger case. An MVP, in my mind, would work the following way:
Basically, such an MVP would have close to zero false positives. I tried a while ago but got sidetracked before I had a working solution. If anyone has a concrete strategy for how to parse the "immediate async context" I'd be delighted! I'm also down to discuss and review anything on this topic. |
Oh, and @tmandry just informed me that there's already an open issue in Clippy for this. |
C# lints when there are no |
In order to write reliable and performant async code today, the user needs to be aware of which functions are blocking, and either:
Determining if a function may block is non-trivial: there are no compiler errors, warnings or lints, and blocking functions are not isolated to special crates but rather unpredictably interspersed with non-blocking, async-safe synchronous code. As an example, most of
std
can be used in async code, but much ofstd::fs
andstd::net
cannot. To make matters worse, this failure mode is notoriously hard to detect: it often compiles and runs fine when the executor is under a small load (such as in unit tests), but can cause severe application-wide bottlenecks when load is increased (such as in production).For the time being, we tell our users that if a sync function uses IO, IPC, timers or synchronization it may block, but such advice adds mental overhead and is prone to human error. I believe it's feasible to find an automated solution to this problem, and that such a solution delivers tangible value.
Proposed goal
Possible solutions
I am not qualified to say, and I would like your thoughts! A couple of possibilities:
#[may_block]
that can be applied on a per-function level.Challenge 1: Blocking is an ambiguous term
Typically, blocking is either the result of a blocking syscall or an expensive compute operation. This leaves some ambiguous cases:
std
involve expensive computation by any reasonable definition.std::sync::Mutex::lock
is blocking only under certain circumstances. Themust_not_await
lint is aimed at preventing those circumstances (instead of discouraging its use altogether).TcpStream::write
blocks by default, but can be overridden to not block usingTcpStream::set_nonblocking
.println!
andeprintln!
macros may technically block. Since they lack async-equivalents and rarely block in practice, they are widely used and relatively harmless.Ambiguity aside, there are many clear-cut cases (e.g.
std::thread::sleep
,std::fs::File::write
and so on) which can benefit from a path forward without the need for bike-shedding.Challenge 2: Transitivity
If
fn foo() { bar() }
andbar()
is blocking,foo()
is also blocking. In other words, blocking is transitive. If we can apply the annotation transitively, more cases can be detected. OTOH, this can be punted for later.Challenge 3: Traits
When dynamic dispatch is used, the concrete method is erased. Should annotations be applied to trait method declaration or in the implementation?
Challenge 4: What is an "async context"?
The detection should only apply in "async contexts". Does the compiler already have a strict definition for that? Examples from the top of my head:
poll
method of a custom future impl, or thepoll_fn
macro.Background reading
The text was updated successfully, but these errors were encountered: