-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cancelawait
keyword to abort an async function call
#5913
Comments
I really don't like the idea of an implicit suspend point. It smells an awful lot like hidden control flow. Perhaps we should require async functions to retrieve their result location explicitly? Wait, no, then that's function colouring. Hmm. (Also, is there a specific reason that the keyword can't just be |
I should clarify, there is already a suspend point at a |
Sounds like a nice proposal, with moving execution into await instead of async for non-suspend functions being quite the change. Was left with a few questions after reading it over: What and how would cancellation look like for normal calls to async functions? (e.g. Also, is execution deferred until |
No cancellation possible for these. The result location and the awaiter resume handle are both available from the very beginning of the call. When it gets to
At the end of the compilation process, every function is assigned a calling convention. Async functions have an async calling convention. So the compiler does have to "color" functions internally for code generation purposes. So it's based on compile time analysis. (That's status quo already) |
For the last part, as I understand it now, doing The reason I find this significant is because, if that is true, then it changes the current assumptions on what the |
First, I want to note that result location semantics can already be (and may already be) supported for calls to async functions that do not use the That said, I see two fatal inconsistencies between blocking and async functions with this proposal. I think they are much more subtle and hard to catch than problems with result location semantics, so IMO it would be better for the language not to support result location semantics for async calls than to take on these new problems. These two examples are related but subtly different. Fixing one will not fix the other.
|
In the exmaple This Proposalfn main() void {
seq('a');
var frame1 = async foo();
seq('c');
var frame2 = async bar();
seq('e');
const x = await frame1;
seq('k');
const y = await frame2;
seq('m');
}
fn foo() i32 {
defer seq('j');
seq('b');
operationThatSuspends();
seq('f');
return util();
}
fn util() i32 {
seq('g');
operationThatSuspends();
seq('i');
return 1234;
}
fn bar() i32 {
defer seq('l');
seq('d');
operationThatSuspends();
seq('h');
return 1234;
} |
Ah ok, think that was where my misunderstanding was. My last point of confusion was related to how non-suspending async fns are handled:
Is this change in semantics something applied by compile time analysis or through some other observation? If its compile time defined, what happens to the result values of |
Instead of a new keyword, why couldn't a frame just have a errdefer download_task.cancel(); |
@frmdstryr Nice idea. Would it make sense to extend this to other frame functionality? download_task.resume(); download_task.await(); EDIT: removed |
These aren't methods though -- they're built-in functionality. Writing them as methods is misleading, and breaks the principle of all control flow as keywords. |
@EleanorNB All control flow isn't currently keywords as function calls themselves are a form of control flow and can have control flow inside them as well. If I understand correctly, |
Thought: if |
Another option to maybe consider: |
No good -- not all suspend points are marked with |
I was under the assumption that there are only two ways to introduce a suspend point: The former could return the error as noted earlier, and to mimic current semantics would be to ignore the error: The latter AFAICK has two choices:
In both cases, the marking is at the suspension point rather than at |
A blocking async function call is an implicit fn foo() u32 {
var x: u32 = 4;
callThatMaySuspend(); // x must be saved to the frame, this call is a suspend point
// equivalent to `await async callThatMaySuspend();`
return x;
} For cancellation to work, any function that may suspend or await (and supports cancellation) needs to return an error union which includes cancelled. This is the "function colouring, all over again" that Eleanor is describing. |
Hm, forgot about compiler inserted awaits. The first bullet point sounds like the way to go there (the compiler adding At first glance, this makes sense as code which expects a result (e.g. using await) isn't written in a way to handle cancellation. You would then only be able to meaningfully cancel frames which are at suspends that explicitly support/handle cancellation (e.g. suspended in a async socket/channel which has more |
Implicit Since we want to localise any explicitly |
In line with #5277, this should be consistent if we only allow |
@EleanorNB why would implicit |
Discarding all errors from an operation, only if the enclosing function happens to be async, which is nowhere explicitly marked? No thankyou. In my eyes, the |
Does zig have something akin to AggregateError? |
Then whether a frame is cancelable or not depends on its current suspend point, which otherwise is completely invisible and unpredictable to the caller. What you get then is people saying that for safety, you should never try to cancel a frame. That's a C problem; Zig is better than that.
This would typically be known by the programmer, so we would trust them not to attempt this. In such functions, the
It's not a request. We don't ask nicely. When we say
An
The semantics of
There is always going to be some semantic difference between synchronous and asynchronous code. That's the whole point. However, the programmer's model doesn't change, and no code needs to be rewritten -- we're still colourblind. Under your proposal, colouring would be a lot worse: asynchronous calls have to have special second error set, synchronous calls * cannot* have that lest it be confused with an ordinary error set. |
I don't really follow.
This has actually been a pain point in Rust futures as well. It requires implementing cancellation at the destructor of the Future/Frame but that is only synchronous. People want asynchronous cancellation (e.g. The latter of not heap-allocating, which is blocking on cancellation, can actually be both an inefficiency + logic error:
Again, not everything can be cancelled. So you end up introducing runtime overhead as stated above in order to accommodate a language semantic. It would be great if we don't end like rust in that regard as its sacrificing customizability for simplicity without a way to opt-out as its at the lang level.
I think there has been another misunderstanding. My idea of cancellation doesn't include defers or how to run them any differently. It only introduces The latter was what I was suggesting before. Here, The former is also an option (that I just thought of), which could be made more forgiving by
The issue here is that suspend + normal function calls that aren't at the end of the scope or use
Again, this is not the case. |
Without even looking at the called function, standard coding practice is enough to ensure exactly one suspension is paired with one resumption, and one invocation with one completion -- so, if the programmer has done their job well, they should not encounter language-enforced crashes. However, there is no way of inspecting the internal suspension state of a function, so the invoker can't know whether it's suspended directly or awaiting. Thus, any attempt at cancellation, no matter how careful the programmer, has a possibility of crashing the program. (Even worse, the common pattern of calling a function to register the frame with the event loop is guaranteed to crash.) Call me crazy, but if the programmer has done their due diligence, they shouldn't have to worry about language-enforced crashes. As you've pointed out though, my model (actually Andrew's model as well in the relevant places) isn't perfect either -- cancellation would then itself be an asynchronous process, which means it would need its own frame, and that frame would itself need to be cancelable, and how the hell would that work? It seems to me that no implementation of cancellation can ever be guaranteed to succeed, which in my eyes contradicts point 11b of the Zen. In light of this, @andrewrk, I don't believe that cancellation should be implemented at the language level. We may provide a cancel token implementation in the standard library (which is a much better and more flexible solution anyway), but async frames themselves must be awaited to complete. I do believe however that the proposed asynchronous RLS is a worthwhile idea. |
We may implement one language-level feature to make userspace cancellation easier: rather than // In the suspending function
const action = suspend {
event_loop.registerContinuationAndCancellation(@frame(), continuation_condition, cancellation_condition);
};
switch (action) {
.go => {},
.stop => return error.functionXCancelled;
}
// In the event loop (some details missing)
if (frame.continuation and @atomicRmw(bool, &frame.suspended, .Xchg, false, .Weak) {
resume frame.ptr, .go;
frame.* = null;
}
if (frame.cancellation and @atomicRmw(bool, &frame.suspended, .Xchg, false, .Weak) {
resume frame.ptr, .stop;
frame.* = null;
} Since const suspendingFunction = fn (arg: Arg) ReturnType : ContinuationType {
// ...
}; Any function that uses the This not only permits flexible evented userspace cancellation, but also more specialised continuation conditions: a function waiting for multiple files to become available could receive a handle to the first one that does, and combined with a mechanism to check whether a frame has completed, #5263 could be implemented in userspace in the same manner. At first blush, this may appear to be hostile to inlining async functions -- however, allowing that would already require semantic changes (#5277) that actually complement this quite nicely: This is, of course, a separate proposal -- I'll write up a proper one later. |
What if async fn's could return a user defined Then if you can access the result location from within the async fn and have a pub fn Future(comptime Frame: type, comptime ReturnType: type) type {
return struct {
frame: Frame,
state: enum{Running, Cancelled, Finished}, .Running,
result: ?ReturnType = null,
};
}
pub fn fetchUrl(allocator: *Allocator, url: []const u8) .callconv(.Async=Future) ![]const u8 {
// Do stuff
while (@result().state != .Cancelled ) {
// Keep working
}
// Handle however you want, this can cleanup your allocated resources
if (@result().state == .Cancelled) return error.Cancelled;
@result().state = .Finished;
}
Using var download_future = async fetchUrl(allocator, "https://example.com/");
errdefer switch (download_future.state) {
.Running => {
download_future.state = .Cancelled; // Should use atomics
cancelawait download_future.frame;
},
.Finished => allocator.free(download_future.result.?),
}
var file_future = async readFile(allocator, "something.txt");
errdefer switch (file_future.state) {
.Running => {
file_future.state = .Cancelled; // Should use atomics
cancelawait file_future.frame;
},
.Finished => allocator.free(file_future.result.?),
}
const download_text = try await download_future.frame
defer allocator.free(download_text);
const file_text = try await download_future.frame;
defer allocator.free(file_text);
I don't see how a cancel without being able to ignore it is a good idea. Some functions may need to be able to ignore the cancel request if something else fails (eg say a Edit: I guess just adding a state flag to the existing frame would work too. |
The main point of having a state flag that can be referenced from within the async function is so that it can handle cleaning up it's own resources which avoids the problem of "side effects". |
|
@frmdstryr Adding a state flag to the frame would be reimplementing the state flags that are already inside the frame. Exposing the state to the user like this specifically means it cant do optimizations like
This also doesn't take into account multi-threaded access to the frame. The state load/check/store there would need to be a CAS, and being able to hide that from the user may allow the compiler to utilize more efficient atomic ops for interacting with the state like atomic swap.
@EleanorNB It must succeed but there's no requirement on when it does so or how it reports success. Arena allocators are a good example as their .free()/.destroy() functions succeed even though they don't actually deallocate the resource. It assumes that the resource will be deallocated in the future from another manner (particularly the allocator's deinit()). |
Ah, so scratch the idea of adding it to the frame itself. I guess I'm just making more noise here... since this is roughly a worse version of #5263 (comment) except the Future/CancelToken is returned by using |
somehow wound up thinking about this. I like @EleanorNB's suggestion about introducing a cancellation token scheme in stdlib, in part, because, that's what I did, with beam.yield. |
Wanted to vouch for this idea of having a cancellation token scheme in the standard library over requiring new syntax and logic in place for canceling async frames. I've been using cancellation tokens for canceling I/O and arbitrary tasks in my code using a Here are some links to some code I'm working on which contains and makes heavy use of a A single-threaded send(), recv(), read(), write(), accept(), connect(), timeout syscalls that are driven by io_uring which take in a A set of single-threaded synchronization primitives which take in a A cancellable worker loop function which takes in a An async TCP client pool and TCP server with backpressure support which supports cancellation: https://github.com/lithdew/rheia/blob/dde13020d069b6819a5ad8bd0980863009a17195/net.zig A multi-threaded |
Isn't it adding user data to callee: fn foo() {
suspend {}
if (@frame().user_data.suspend) {}
} caller: var frame = async foo();
frame.user_data.suspend = true; |
Counter argument : I don't think there should be a way to cancel async functions. This shouldn't be a language feature. This is user-space stuff. Rational: There are two main mental models for coroutines/CPS/async-await. One : They are like threads, w/o using OS threads (e.g. cooperative multitasking). You can't cancel a thread from the outside. You shouldn't be able to cancel a async call from the outside. Two : They are just "hiding" call-backs, and auto-magicaly creating your callback "context" for you. You can only cancel callbacks from the outside. Since neither "model" has the concept of a generic way to cancel, neither should suspend/resume. Doing the correct cleanup code is so case specific, this shouldn't be a language feature. Maybe sometimes you want to run the waiting code w/ a flag telling it to exit (e.g. most I/O), sometimes the eventloop can just delete the frame and go on it's way (most timer callbacks). Anecdotally, all of the horrible nastiness in other languages coroutines impl surrounds cancellation and error propagation when canceling. Just don't do it. Also Anecdotally, I've used coroutines of a number or large projects. The only times I've ever wanted to cancel one is when my code sucked, and I was too lazy to reflector it correctly. It's also a solved problem. How did you "cancel" I/O when using threads for the past 20 years? Just do that. |
Oh, that's brilliant! The reason why cancellation seems necessary is that there are two fundamental concurrent operations. Given two "futures" / concurrent operations a and b, you might want to run then concurrently and
So: const a = async update_db();
const b = async update_cache();
await @join(a, b); // Want to update _both_ db and cache const a = async read_db();
const b = async read_cache();
await @race(a, b); // Wait for _one of_ db and cache, whichever is faster But So, the second example can be re-written roughly as const ct: CancelationToken = .{};
const a = async read_db(&ct);
const b = async read_cache(&ct);
await @join(a, b) where both functions:
I bet this scales to fully-general |
I believe we do need the ability to cancel async functions. There are many examples, for details: Timeouts and cancellation for humans.
Coroutines aren't like threads. We definitely can "cancel" a process by sending a signal. Threads can't be killed from the outside because they share everything in a process and they are not cooperative. Even though, we can cancel threads if we make them "cooperative" somehow. (e.g. There is a main loop in each thread which checks cancellation requests and handles them.) The detail can be wrapped by languages or libraries so that it looks like we are "cancelling" threads. There's no technical reason that we can't cancel coroutines.
That's not too difficult, a cancellation is just like a specific error. If the cleanup code works on some regular errors, it works on cancellations.
That's because there are few languages/libraries designed with structured concurrency, another reference: Notes on structured concurrency.
That's right! We call them task groups in structured concurrency, it's kind of primitive for concurrency (except that if you can't pass a task group as an argument, it's not easy to spawn background tasks when needed). |
The issue is that not all async functions are cancellable. Certain operations are atomic to the caller (or stateful) but still use asynchronous operations. This is the idea of cancellation safety. For threads, you can send a signal to either request a cancellation (i.e. SIGTERM) or force it regardless of the thread's decision (i.e. SIGKILL). Regarding semantics, the latter is most likely undesirable as you can't recover (or in zig speak, "run defers"). Cancellation requests should be the solution then IMO. But since it's only a request, the thread has the opportunity to ignore it for various reasons (i.e. it's not cancel-safe). This means you must wait for the thread to complete regardless before you relinquish its resources. If not, you risk leaks (unstructured concurrency) or UAFs (structured concurrency). Cancellation Tokens are a great solution here as they're 1) opt-in for tasks which are cancel-safe and 2) require joining the task anyways to account for those that aren't cancel-safe. That they're shared between tasks in @matklad's proposed API is a composability nicety (each task could as well just have their own Token and a separate construct shared between tasks could cancel each Token separately). |
Here are two specific, simple examples which are useful as an intuition pump and a litmus test for any cancellation mechanism. Example 1: an asynchronous tasks submits a read request to io_uring and then gets cancel. To actually cancel the task, what is needed is submitting another cancel request to io_uring (so, another syscall) and then waiting for it to complete. If you don't do this, then the read might still be executing in the kernel while your task is already "canceled", effectively writing to some now-deallocated memory Example 2: without anything exotic, an async tasks offloads some CPU-heavy task (like computing a checksum) to a thread pool. To cancel this task, we also must cancel the thread-pool job, but that doesn't have cancellation built-in, as it is deeply in some simd loop. So the async task just have to wait until until the CPU side finishes. If you cancel only the async task, and let the CPU part run its course, you are violating structured cocnurrency (and potentially memory safety, if the CPU part uses any resources owned by the async part) That is, cancellation is only superficially similar to error handling: error handling is unilateral and synchronous. General cancellation is an asynchronous communication protocol: first, you request cancellation, then you wait for the job to actually get canceled. A more useful framing is that cancellation is serendipitous success |
In zig-aio I do cancelation by making async io functions and yield in coroutines return |
I've spent many hours in the past trying to solve this, and never quite tied up all the loose ends, but I think I've done it this time.
Related Proposals:
Problem 1: Error Handling & Resource Management
Typical async await usage when multiple async functions are "in-flight", written naively, looks like this:
Spot the problem? If the first
try
returns an error, the in-flightfile_frame
becomes invalid memory while thereadFile
function is still using the memory. This is nasty undefined behavior. It's too easy to do this on accident.Problem 2: The Await Result Location
Function calls directly write their return values into the result locations. This is important for pinned memory, and will become more noticeable when these are implemented:
return
statement result location: ability to refer to the return result location before thereturn
statement #2765However this breaks when using
async
andawait
. It is possible to use the advanced builtin@asyncCall
and pass a result location pointer toasync
, but there is not a way to do it withawait
. The duality is messy, and a function that relies on pinning its return value will have its guarantees broken when it becomes an async function.Solution
I've tried a bunch of other ideas before, but nothing could quite give us good enough semantics. But now I've got something that solves both problems. The key insight was making obtaining a result location pointer for the
return
statement of anasync
function, implicitly a suspend point. This suspends the async function at thereturn
statement, to be resumed by theawait
site, which will pass it a result location pointer. The crucial point here is that it also provides a suspension point that can be used forcancelawait
to activate. If an async function is cancelled, then it resumes, but instead of returning a value, it runs theerrdefer
anddefer
expressions that are in scope. So - async functions will simply have to retain the property that idiomatic code already has, which is that all the cleanup that possibly needs to be done is in scope in a defer at areturn
statement.I think this is the best of both worlds, between automatically running a function up to the first suspend point, and what e.g. Rust does, not running a function until
await
is called. A function can introduce an intentional copy of the result data, if it wishes to run the logic in the return expression before anawait
result pointer is available. It means async function frames can get smaller, because they no longer need the return value in the frame.Now this leaves the problem of blocking functions which are used with
async
/await
, and whatcancelawait
does to them. The proposal #782 is open for that purpose, but it has a lot of flaws. Again, here, the key insight ofawait
working properly with result location pointers was the answer. If we move the function call of non-suspending functions used with async/await to happen at the await site instead of the async site, thencancelawait
becomes a no-op.async
will simply copy the parameters into the frame, andawait
would do the actual function call. Note that function parameters must be copied anyway for all function calls, so this comes at no penalty, and in fact should be better all around because we don't have "undoing" of allocated resources but we have simply not doing extra work in the first place.Example code:
Now, calling an async function looks like any resource allocation that needs to be cleaned up when returning an error. It works like
await
in that it is a suspend point, however, it discards the return value, and it atomically sets a flag in the function's frame which is observable from within.Cancellation tokens and propagating whether an async function has been cancelled I think can be out of scope of this proposal. It's possible to build higher level cancellation abstractions on top of this primitive. For example, #5263 (comment) could be improved with the availability of
cancelawait
. But more importantly,cancelawait
makes it possible to casually useasync
/await
on arbitrary functions in a maintainable and correct way.The text was updated successfully, but these errors were encountered: