-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend standalone support #18285
base: main
Are you sure you want to change the base?
Extend standalone support #18285
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
I left a few nits. Perhaps we can maybe split this up.
Can we add some some new tests based on these? |
I have never developed in C before, I don't really know where to start, can you give me some hints where you want these tests added and what they should test? |
system/lib/standalone/standalone.c
Outdated
@@ -155,24 +462,32 @@ double emscripten_get_now(void) { | |||
return (1000 * clock()) / (double)CLOCKS_PER_SEC; | |||
} | |||
|
|||
__attribute__((__weak__)) | |||
void _emscripten_throw_longjmp() { | |||
REPORT_UNSUPPORTED(do an invalid longjmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call longjmp
would be more accurate I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I could only trigger this by doing an invalid jump, but I think you would know better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.. all longjmp's require callout out of JS IIUC. WebAssembly, as it stands, has no support for longjmp so we call out the JS VM to throw
for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Not that it matters but do you have any C to demonstrate this? Just curious.
In my tests I was only able to trigger this by doing an invalid jmp. I think the jumps were being handled by the invoke_* methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any call to longjmp
from C code will called _emscripten_throw_longjmp
IIUC.
So here I would just do REPORT_UNSUPPORTED(longjmp)
system/include/wasi/api.h
Outdated
* - https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_14 | ||
* - https://github.com/WebAssembly/wasi-libc/blob/wasi-sdk-16/libc-bottom-half/sources/preopens.c#L215 | ||
*/ | ||
#define __WASI_FD_ROOT 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the way it works is that there can be any number of pre-opens, starting at 3: https://github.com/WebAssembly/wasi-libc/blob/30094b6ed05f19cee102115215863d185f2db4f0/libc-bottom-half/sources/preopens.c#L212-L215. And non of these pre-opens are necessarily the root, they can be mounted in various places.
Also, can you move this macros (if you keep it), into standalone.c. api.h is copied from wasi-libc, and I think its better to avoid have local modifications to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we really needs some tests for this.
The most basic test for the standalone syscalls is in test_standalone_syscalls
in test_other.py
.
For other tests look for @also_with_standalone_wasm
in test_core.py
on behalf of wasm people I know I would like to thank @jerbob92 and @sbc100 for progressing this, as it not only unlocks things like pdfium, but also generic wasm utilities that use emscripten (like wabt). Ex being able to run wat2wasm using any wasi runtime instead of whatever was released and packaged. If any one not currently involved has skills to contribute, please do as this is very likely well into "hobby time" for @jerbob92 and it is at the core a pretty substantial infrastructure change and a lot of work to test it. other wasm people will thank you, but I'll thank in advance! |
If you just wast a wasi version of the wabt tools I think the simplest path would be to use wasi-sdk to build it. Of course that doesn't mean we shouldn't land this change too. |
@sbc100 ps your last comment didn't format right. might want to polish it |
good idea, on the wabt thing and indeed a separate topic if they are open to decoupling from emscripten. |
IIUC wabt is not coupled to emscripten, we just happen to build it with emscripten for the web demo. |
(I guess maybe what you are saying is that it would be nice if the web version of wabt, which is run by emscripten, happened to also be WASI compliant.. in that case I agree that would be very cool). |
yeah sorry it was that we were looking for a wabt compiled to wasm and found the one that was compiled with emscripten for the web. I raised this issue for out of browser wasi binary WebAssembly/wabt#2101 |
Thanks @codefromthecrypt! So in hindsight, I don't think my C skills are good enough to get this PR into a merge-able state, implementing AT_FDCWD, using the preload dirs and not the FD 3, implementing tests. |
@jerbob92 no worries, I think you got things very far. I'll keep recruiting to whatever end on this. |
55abfc0
to
9fbd75d
Compare
@sbc100 I have done some more progress on this PR now:
I do have some questions though:
|
I don't feel strongly about this. Perhaps use the same split that wasi-libc uses? Also I don't think we need
The ideas is that we will be running the full wasi testsuite. I started adding some of it in #12704 and I have some
I don't think its a problem, but please document the origin and try to document any changes from upstream.
Sure. We could even consider adding wasi-libc as a submodule and including certain files directly? |
Sounds good!
Nice, that will make it a lot easier!
If that's a possibility that would be great, it would make it a lot easier because right now I'm just cherry-picking wasi-libc code to get the syscalls/WASI calls working that I require, especially if we want to pass the whole wasi-testsuite then we have to copy a lot from was-libc. Only thing I noticed while copying code is that their WASI function signature is a bit different sometimes, which prevents us from directly using the C code (AFAIK), so we might want to look into making that possible. For example: So Emscripten requires the path length and wasi-libc doesn't. But perhaps we can work around this by completely using wasi-libc for the wasi part. |
return 0; | ||
} | ||
|
||
int rmdirat(int dirfd, intptr_t path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this static
and make it clear this is an internal helper?
A stupid nit - if it was based on wasi libc, shouldn't there be copyright attribution somewhere? (Or is that not necessary?) |
No stupid at all, we should certainly consider how to handle this. I'm not sure why the right answer is. Or how we should deal with keeping the codebases in sync. One easy option wold be to add wasi-libc as a submodule instead of duplicating the code here. |
That makes sense. Do you know if wazero also implements some of the emscripten syscalls? Or is it only WASI syscalls? |
See https://v8.dev/blog/emscripten-standalone-wasm#necessary-api-differences regarding the overhead and other issues that prevented us from moving emscripten to use 100% WASI APIs. But perhaps WASI has improved since then? |
Wazero implements WASI and some Emscripten specific things ( Embind support is possible using wazero-emscripten-embind. Since no other host calls have been implemented, most normal Emscripten builds won't work on Wazero, which is why I added more support for standalone builds. I think it makes sense that Wazero focusses on WASI and not on implementing specific Emscripten host calls, also because they are prone to change between Emscripten versions. |
@sbc100 Sorry for being vague. Yea I was referring to Node's WASI implementation for instance via
That I still don't fully understand. Why would it be less efficient to use native APIs at least when running on Node.js? |
I see, yes that could be conceivable more efficient. However with emscripten-generated JS we implement WASI in JS since we have our own virtual file system. i.e. the JS normally has run on the web as well as under node so it can't /doesn't assume uvwasi support. Perhaps you are right that if we push forward with STANDALONE mode and PURE_WASI mode then we could potentially bind uvwasi directly to the wasm module. Especially for users that specify NODERAWFS (i.e. user that already opt out of emscripten's virtual filesystem). However, given that we have limited time to work on stuff, and given that the direction PURE_WASI seems to be heading in is to essentially copy or import wasi-libc into emscripten I'm inclined to suggest that suggesting folks simply use wasi-sdk where possible. |
Having said that I will try to take another look at this PR today and see if we can get it landed ! |
I see. That's a fair point.
Yea that's what I was thinking if users opt-in to use Node's FS instead of Emscripten's virtual fs.
🎉 🙏 |
Just adding another vote of support to this PR. Currently, I am in the process of compiling whisper.cpp to wasi, and without this PR, it's impossible for me to access the model files from the filesystem. Would highly appreciate if this got merged into main. |
Can you explain more about your use case? Is there some reason wasi-sdk doesn't work for you? What platform/engines are you targetting? |
My objective is to use whisper.cpp in a pure Go environment. There's a CGO API of course but that requires libc. Compiling to WASI allows me to bypass all of that. And as others have alluded, wasi-sdk is nowhere as mature and polished as this project. Documentation is sparse and there's no support for bindings as far as I can see. IMO, it would be far easier and lead to a larger user base if we were to put effort into emscripten to allow wasi compatibility, than to put effort into wasi-sdk and make it better. |
Off topic but are you sure you want to be bypassing libc like that? Wouldn't that mean re-implementing a lot of the stuff the libc provides? What do you gain from that?
I'm not sure I would say its not a polished, more that it has a different set of target users. Users who want to target embedded environment and not the web are often better of using wasi-libc. Users targeting JS and the Web are often better of using emscripten.
Where are you hoping to run the you code? Are you targeting JS environments such as the browser or node? I'm a little confused since pretty much all of the advanced features of emscripten (such as bindings) rely heavily on JavaScript and non-WASI APIs. |
Yeah, this is going slightly off-topic and into the internals of Go, but yes Go has the ability to generate builds that do not depend on libc at all. It has alternate implementations of all libc functionality. The gain is portability. Our software is used in a large variety of customer environments on a plethora of linux flavors (sometimes on very old libc versions). Go gives us the ability to not depend on libc and I'd like to use that leverage.
Server side. Again, all of this is purely in Go. Go generates single binaries that you can simply run.
Yes, as mentioned earlier in this thread, there are libraries which re-implement all the bindings in pure Go, allowing users to use emscripten generated builds in a pure Go environment. Refer to: #18285 (comment) Hope that was clear. Happy to answer any further questions :) I fully realize that emscripten is targeted towards JS based environments. But there's a lot of tooling already built that gets us to 90% of what we want to get it to run on non-JS environments. This small PR would almost take it to 99%. Ultimately, it's your decision of course as a maintainer. But this would simply unlock a lot of functionality that would otherwise require a lot of other workarounds. |
Sounds like your libc re-implementations would then be built on top of the syscalls layer which is then, by definition, not portable between OSes.. meaning you would need to re-implement for each OS. But I guess you decided to trade off those two forms of non-portableness?
I'm not necessarily against moving forward with this PR, but I would to ask a few more clarifying questions. For example, if you are already implementing the emscripten ABI/API, why not just continue to do that.. Why does it matter to you if our API is the same as the WASI API or not? Why not implement all the emscripten __syscall_xxx APIs that you need? Is it out of fear that those APIs will not be stable maybe? |
Correct. Go has proper cross-compilation support and automatically takes care of this.
Oh yeah we could. But from what I understand, the
|
IMHO it's way better to add support for WASI in Emcripten than to add Emscripten ABI/API support for every runtime out there. |
Perhaps, but I think that argument only really makes sense if the resulting binary is WASI-compaible. In this case it sounds like the resulting binary will contain a bunch of emscripten-specific stuff anyway, so won't be WASI-compliant moulde, right? (or am I misunderstanding?) |
Correct, there are still a few things that are needed to run Emscripten binaries:
|
Sure, I'd be OK with moving forward with this. I think it still needs a bit of work though. |
This is far from complete if you look at the WASI FS spec and the syscalls, but at least it gets basic operations like directory listing, file/directory opening and file/directory stat working. It was enough for me to get pdfium working in standalone mode.
I didn't really know what I should do with AT_FDCWD or what the runtime should do with it when passing AT_FDCWD. AT_FDCWD is also negative while I think
fd
in WASI is unsigned?Most of the syscall code was taken from or based on wasi-libc.