Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++20 co_await support for Embind promises #20420

Merged
merged 6 commits into from
Nov 3, 2023

Conversation

RReverser
Copy link
Collaborator

This adds support for co_await-ing Promises represented by emscripten::val.

The surrounding coroutine should also return emscripten::val, which will be a promise representing the whole coroutine's return value.

Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises.

Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient.

Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack:

using namespace emscripten;

// clang-format off
EM_JS(EM_VAL, wait_impl, (), {
  return Emval.toHandle(Promise.resolve());
});
// clang-format on

val wait() { return val::take_ownership(wait_impl()); }

val coro_co_await(int depth) {
  co_await wait();
  if (depth > 0) {
    co_await coro_co_await(depth - 1);
  }
  co_return val();
}

val asyncify_val_await(int depth) {
  wait().await();
  if (depth > 0) {
    asyncify_val_await(depth - 1);
  }
  return val();
}

EMSCRIPTEN_BINDINGS(bench) {
  function("coro_co_await", coro_co_await);
  function("asyncify_val_await", asyncify_val_await, async());
}

And the JS runner also comparing with pure-JS implementation:

import Benchmark from 'benchmark';
import initModule from './async-bench.mjs';

let Module = await initModule();
let suite = new Benchmark.Suite();

function addAsyncBench(name, func) {
	suite.add(name, {
		defer: true,
		fn: (deferred) => func(1000).then(() => deferred.resolve()),
	});
}

for (const name of ['coro_co_await', 'asyncify_val_await']) {
  addAsyncBench(name, Module[name]);
}

addAsyncBench('pure_js', async function pure_js(depth) {
  await Promise.resolve();
  if (depth > 0) {
    await pure_js(depth - 1);
  }
});

suite
  .on('cycle', function (event) {
    console.log(String(event.target));
  })
  .run({async: true});

Results with regular Asyncify (I had to bump up ASYNCIFY_STACK_SIZE to accomodate said deep stack):

> ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000
> node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs

coro_co_await x 727 ops/sec ±10.59% (47 runs sampled)
asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled)
pure_js x 3,022 ops/sec ±8.06% (52 runs sampled)

Results with JSPI (I had to disable DYNAMIC_EXECUTION because I was getting RuntimeError: table index is out of bounds in random places depending on optimisation mode - JSPI miscompilation?):

> ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0
> node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs

coro_co_await x 955 ops/sec ±9.25% (62 runs sampled)
asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled)
pure_js x 3,258 ops/sec ±8.98% (53 runs sampled)

So the performance is much faster than regular Asyncify, and on par with JSPI.

Fixes #20413.

@RReverser

This comment was marked as outdated.

@u2re-dev

This comment was marked as off-topic.

@RReverser

This comment was marked as off-topic.

@RReverser

This comment was marked as off-topic.

@RReverser

This comment was marked as off-topic.

@RReverser

This comment was marked as off-topic.

@RReverser

This comment was marked as off-topic.

@RReverser

This comment was marked as off-topic.

@RReverser

This comment was marked as off-topic.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 9, 2023

@tlively who is the maintainer of the current C/C++ promise integration.

@RReverser
Copy link
Collaborator Author

who is the maintainer of the current C/C++ promise integration.

Yeah I added him as a reviewer, but then I'll reiterate like I did in the issue that this feature is separate from Asyncify / JSPI / promise.h and doesn't rely on them in any way (by design).

Unlike in Asyncify / JSPI, in this case only local coroutine is paused, and that's done entirely by C++ / LLVM compiling those co_await into a state machine, so the Wasm engine doesn't know anything about promises. We're only providing Embind bindings for that operator.

@RReverser RReverser changed the title co_await support for Embind C++20 co_await support for Embind promises Oct 11, 2023
@RReverser
Copy link
Collaborator Author

I'll add that one particular motivation for this is that it's impossible to use proxying APIs with Asyncify at the moment; even if it was, it would be difficult because different threads could be competing for Asyncify state which can't handle queueing.

So when I needed to do something like

proxySync([] {
  val foo = doSomethingSynchronous();
  val bar = foo.call<void>("someAsyncMethod").await();
  val baz = somethingSynchronousAgain(bar);
  ...another await..
});

it had to be rewritten so that each synchronous call is proxied separately in one proxySync-based helper, and each asynchronous call has to be wrapped into another helper that subscribes to promise and uses proxySyncWithCtx, so it kept going forth and back between threads in a rather ugly mix of code.

With coroutines this is not a problem, as I can proxy the entire coroutine execution to the main thread in one go and use co_await for awaiting those intermediate values.

@RReverser
Copy link
Collaborator Author

Ping, would love a review on this. I have a project where having co_await would come very handy.

Copy link
Collaborator

@brendandahl brendandahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having not used c++20 coroutines at all yet, this seems reasonable and a pretty nice feature to avoid asyncify/jspi. I'd still like to hear @tlively's thoughts since he had started doing something with coroutines.

site/source/docs/api_reference/val.h.rst Outdated Show resolved Hide resolved
site/source/docs/api_reference/val.h.rst Outdated Show resolved Hide resolved
system/include/emscripten/val.h Outdated Show resolved Hide resolved
test/embind/test_val_coro.cpp Outdated Show resolved Hide resolved
@RReverser
Copy link
Collaborator Author

RReverser commented Oct 20, 2023

and a pretty nice feature to avoid asyncify/jspi

@brendandahl Btw, just wanted to add that I found it has its place in combination with Asyncify too, not necessarily instead of.

In particular, it helps to write a helper coroutine that does a bunch of co_await that's handled by C++ compiler, and then do a single my_func().await() to wait for all of them to finish, this time with Asyncify.

This way, when you need to pause the entire app, you can do so with just one expensive unwind/rewind by Asyncify instead of having lots of unwinds/rewinds for every individual async operation inside such async-heavy function.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 22, 2023

I'm afraid I haven't yet understood C++ co-routines or how that can be useful here. I would like to take some time to better understand this change, but if @tlively or @brendandahl think they have a good handle on it I'll defer to them.

@tlively
Copy link
Member

tlively commented Oct 22, 2023

Meanwhile I don't understand any of the embind stuff, nor do I understand the C++20 co_await stuff off the top of my head 😅

I'll have to dig into cppreference to understand the coroutine stuff here. @brendandahl and @sbc100, do you want to schedule some time to review this collaboratively?

@RReverser
Copy link
Collaborator Author

Lol I love where this is going 😅

To be fair, I didn't know anything about C++ coroutines / co_await before trying to implement it either, and the terminology C++ used for this stuff is confusing / different from other languages, which didn't help... but in the end it works.

@brendandahl and @sbc100, do you want to schedule some time to review this collaboratively?

FWIW I'd be happy to join in for a meeting if that helps, assuming Europe-friendly time.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 23, 2023

Lol I love where this is going 😅

To be fair, I didn't know anything about C++ coroutines / co_await before trying to implement it either, and the terminology C++ used for this stuff is confusing / different from other languages, which didn't help... but in the end it works.

@brendandahl and @sbc100, do you want to schedule some time to review this collaboratively?

FWIW I'd be happy to join in for a meeting if that helps, assuming Europe-friendly time.

Sounds good. How about Friday morning. 9am PST?

@RReverser
Copy link
Collaborator Author

RReverser commented Oct 23, 2023

How about Friday morning. 9am PST?

Hm so it will be Friday evening here... tentatively yes, but if we can do that between Tuesday-Thursday, would be a bit better.

@RReverser
Copy link
Collaborator Author

Sounds good. How about Friday morning. 9am PST?

Should we give it a try this week to help unblock this?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 2, 2023

Sounds good. How about Friday morning. 9am PST?

Should we give it a try this week to help unblock this?

9am PST tomorrow? sgtm.. I'll send out an calendar invite.

system/include/emscripten/val.h Outdated Show resolved Hide resolved
test/embind/test_val_coro.cpp Outdated Show resolved Hide resolved
src/library_sigs.js Show resolved Hide resolved
test/test_core.py Outdated Show resolved Hide resolved
system/include/emscripten/val.h Outdated Show resolved Hide resolved
@RReverser RReverser force-pushed the emval-coro branch 2 times, most recently from 44b2c0d to fba6de7 Compare November 3, 2023 18:59
This adds support for `co_await`-ing Promises represented by `emscripten::val`.

The surrounding coroutine should also return `emscripten::val`, which will
be a promise representing the whole coroutine's return value.

Note that this feature uses LLVM coroutines and so, doesn't depend on
either Asyncify or JSPI. It doesn't pause the entire program, but only
the coroutine itself, so it serves somewhat different usecases even though
all those features operate on promises.

Nevertheless, if you are not implementing a syscall that must behave as-if
it was synchronous, but instead simply want to await on some async operations
and return a new promise to the user, this feature will be much more efficient.

Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise
repeatedly in a deep call stack:

```cpp

using namespace emscripten;

// clang-format off
EM_JS(EM_VAL, wait_impl, (), {
  return Emval.toHandle(Promise.resolve());
});
// clang-format on

val wait() { return val::take_ownership(wait_impl()); }

val coro_co_await(int depth) {
  co_await wait();
  if (depth > 0) {
    co_await coro_co_await(depth - 1);
  }
  co_return val();
}

val asyncify_val_await(int depth) {
  wait().await();
  if (depth > 0) {
    asyncify_val_await(depth - 1);
  }
  return val();
}

EMSCRIPTEN_BINDINGS(bench) {
  function("coro_co_await", coro_co_await);
  function("asyncify_val_await", asyncify_val_await, async());
}
```

And the JS runner also comparing with pure-JS implementation:

```js
import Benchmark from 'benchmark';
import initModule from './async-bench.mjs';

let Module = await initModule();
let suite = new Benchmark.Suite();

function addAsyncBench(name, func) {
	suite.add(name, {
		defer: true,
		fn: (deferred) => func(1000).then(() => deferred.resolve()),
	});
}

for (const name of ['coro_co_await', 'asyncify_val_await']) {
  addAsyncBench(name, Module[name]);
}

addAsyncBench('pure_js', async function pure_js(depth) {
  await Promise.resolve();
  if (depth > 0) {
    await pure_js(depth - 1);
  }
});

suite
  .on('cycle', function (event) {
    console.log(String(event.target));
  })
  .run({async: true});
```

Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack):

```bash
> ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000
> node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs

coro_co_await x 727 ops/sec ±10.59% (47 runs sampled)
asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled)
pure_js x 3,022 ops/sec ±8.06% (52 runs sampled)
```

Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?):

```bash
> ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0
> node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs

coro_co_await x 955 ops/sec ±9.25% (62 runs sampled)
asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled)
pure_js x 3,258 ops/sec ±8.98% (53 runs sampled)
```

So the performance is much faster than regular Asyncify, and on par with JSPI.

Fixes emscripten-core#20413.
@RReverser
Copy link
Collaborator Author

Decided to add a bunch of comments explaining what those special coroutine types & methods do. They somewhat duplicate generic docs about C++ coroutines, but figured they might be useful in code as probably few people have to ever implement them.

@RReverser RReverser enabled auto-merge (squash) November 3, 2023 19:48
@RReverser RReverser merged commit 8ecbdb3 into emscripten-core:main Nov 3, 2023
2 checks passed
@RReverser RReverser deleted the emval-coro branch November 3, 2023 20:46
@RReverser
Copy link
Collaborator Author

Can someone push new docs to the website please? @kripken IIRC you had to do that manually in the past - is that still the case?

@kripken
Copy link
Member

kripken commented Nov 7, 2023

Yes. Updated now!

@andreamancuso
Copy link

andreamancuso commented Jul 20, 2024

Hi, this looks great, I'm getting my hands dirty with coroutines and was wondering if you were planning to add coroutine support to -sFETCH ?

I have been trying to parallelize HTTP requests and do when_all() - I enabled pthreads but am still getting
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C++20 coroutines + Native webassembly promise integration
7 participants