-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++20 co_await support for Embind promises #20420
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
@tlively who is the maintainer of the current C/C++ promise integration. |
Yeah I added him as a reviewer, but then I'll reiterate like I did in the issue that this feature is separate from Asyncify / JSPI / promise.h and doesn't rely on them in any way (by design). Unlike in Asyncify / JSPI, in this case only local coroutine is paused, and that's done entirely by C++ / LLVM compiling those |
I'll add that one particular motivation for this is that it's impossible to use proxying APIs with Asyncify at the moment; even if it was, it would be difficult because different threads could be competing for Asyncify state which can't handle queueing. So when I needed to do something like proxySync([] {
val foo = doSomethingSynchronous();
val bar = foo.call<void>("someAsyncMethod").await();
val baz = somethingSynchronousAgain(bar);
...another await..
}); it had to be rewritten so that each synchronous call is proxied separately in one With coroutines this is not a problem, as I can proxy the entire coroutine execution to the main thread in one go and use |
Ping, would love a review on this. I have a project where having |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having not used c++20 coroutines at all yet, this seems reasonable and a pretty nice feature to avoid asyncify/jspi. I'd still like to hear @tlively's thoughts since he had started doing something with coroutines.
@brendandahl Btw, just wanted to add that I found it has its place in combination with Asyncify too, not necessarily instead of. In particular, it helps to write a helper coroutine that does a bunch of This way, when you need to pause the entire app, you can do so with just one expensive unwind/rewind by Asyncify instead of having lots of unwinds/rewinds for every individual async operation inside such async-heavy function. |
I'm afraid I haven't yet understood C++ co-routines or how that can be useful here. I would like to take some time to better understand this change, but if @tlively or @brendandahl think they have a good handle on it I'll defer to them. |
Meanwhile I don't understand any of the embind stuff, nor do I understand the C++20 co_await stuff off the top of my head 😅 I'll have to dig into cppreference to understand the coroutine stuff here. @brendandahl and @sbc100, do you want to schedule some time to review this collaboratively? |
Lol I love where this is going 😅 To be fair, I didn't know anything about C++ coroutines / co_await before trying to implement it either, and the terminology C++ used for this stuff is confusing / different from other languages, which didn't help... but in the end it works.
FWIW I'd be happy to join in for a meeting if that helps, assuming Europe-friendly time. |
Sounds good. How about Friday morning. 9am PST? |
Hm so it will be Friday evening here... tentatively yes, but if we can do that between Tuesday-Thursday, would be a bit better. |
Should we give it a try this week to help unblock this? |
9am PST tomorrow? sgtm.. I'll send out an calendar invite. |
44b2c0d
to
fba6de7
Compare
This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.
Decided to add a bunch of comments explaining what those special coroutine types & methods do. They somewhat duplicate generic docs about C++ coroutines, but figured they might be useful in code as probably few people have to ever implement them. |
Can someone push new docs to the website please? @kripken IIRC you had to do that manually in the past - is that still the case? |
Yes. Updated now! |
This adds support for
co_await
-ing Promises represented byemscripten::val
.The surrounding coroutine should also return
emscripten::val
, which will be a promise representing the whole coroutine's return value.Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises.
Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient.
Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack:
And the JS runner also comparing with pure-JS implementation:
Results with regular Asyncify (I had to bump up
ASYNCIFY_STACK_SIZE
to accomodate said deep stack):Results with JSPI (I had to disable
DYNAMIC_EXECUTION
because I was gettingRuntimeError: table index is out of bounds
in random places depending on optimisation mode - JSPI miscompilation?):So the performance is much faster than regular Asyncify, and on par with JSPI.
Fixes #20413.