-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limitations of start function with exported memory #1160
Comments
Reading the linked review, it isn't clear to me what you're trying to execute in Regardless, it seems like segments can do all of the C static global initializers just fine, but the non-trivial C++ global initializers can't generally be done because they might execute arbitrary user code including syscalls. That's indeed troublesome! Do you think the problem is only embedder calls trying to use an exported memory, or do you think these calls might also try to re-enter WebAssembly? I think it's a simple enough thing to do if the fix is to export memory before At the same time, |
What do you mean by "is not available" during start? Presumably the Memory that is created for import could (in principle) be accessed via a closure while start is executing. Or, is there some annoying check that means you can't read/write to the Memory until it's attached to a fully-instantiated Instance? I agree that it makes syscall implementation awkward - I thought of this problem too while I was trying to implement some syscalls for upstream Musl. I wasn't too bothered by the Memory issue, but the problem of imports seems thornier - until the Instance is created (at the end of start) there really isn't any way for imported functions to call back into Wasm. In practice, it's OK, but it could cause problems for someone. The ideal thing would have been that "this" refers to the partially-instantiated Instance if imports are invoked during start, so that an imported function can call exports during start via "this.exports.blah". But that opens up a whole can of safety issues, while the current approach is safe albeit a bit limiting. For syscalls, it shouldn't matter much. By definition, they are in "kernel-mode" and thus don't need to call anything in "user-mode" (Wasm). Most of the syscalls ought to be able to their thing without touching the Module/Instance. @jfbastien - in the LLVM review, there's no particular opinion on what runs during start. Currently LLD's default entrypoint is |
@sbc100 Yes, that does seem problematic in the near term. ES Modules do allow cyclic imports and so, if we had ES Module integration, the JS called by wasm could be an ES module that imported the calling wasm module to get the |
From my POV the problem would be solved by giving the embedder access to the newly created memory before start is called. @jfbastien its not clear to me how you would propose doing this. Can you explain? In terms of what we will use the start section for, the jury is still out on that. What I've been looking at is moving the C/C++ static init function there, and leaving main to be run explicitly by the embedder. But this issue exists whatever we choose to run there. |
I think that what we end up doing should match what ES6 modules do: first they Given this, what I think needs to happen is that we make export availability match what ES6 modules do for their own exports during I think this also happens to fix what you're trying to do. |
But from the JS API POV, how could we allow the developer to get a handle to the new memory before start is called? I was under the impression that the instance would need to be returned in order to get the memory handle, but that the start function is guaranteed to be called before then. As a workaround I could switch all the test cases to import their memory.. which is probably not a bad idea, but obviously others could still run into this issue. |
I'm a bit confused. If you need two-phase initialisation where some init function has to be invoked after the instance has been fully constructed and handed out, then what keeps you from using an ordinary function for that? Or asking the other way round, how would the semantics of a start function whose execution you'd have to initiate explicitly be different from just invoking a normal function? |
@rossberg using a normal function invocation is the alternative option and its what we currently do. Its just that that means our tooling conventions (and clang/lld generated code) won't be able to take advantage of the wasm start section. I don't really feel strongly about using the wasm start section, but I just wanted to raise the issue here in case people see it as problematic that C/C++ won't be using it. |
This came up again in the llmv bug tracker recently: https://bugs.llvm.org/show_bug.cgi?id=37198 |
This is coming up again with WASI: WebAssembly/WASI#19 |
this isn't just a start function problem. if you declare a function in js or wasm, and wasm imports it (esm), that function has no way to get access to the memory of that wasm module. |
This issue relates to the case when the wasm memory is exported from the module. In the case when the wasm memory is imported into the module then is seems reasonable to assume that the embedder has access to the memory since it would have needed it in order to instantiate the module in the first place. There is third case where the memory is neither imported nor exported but of course there is no way to share the memory in that case. |
However I agree this issue should probably be renamed since it relates not just to memory but to anything that wasm module might want to export. |
In retrospect we should have named the start function "init function", to indicate that its purpose is module initialisation, i.e., that the module isn't intended to be accessible before that initialisation is complete. Allowing reentrancy during initialisation would bring about the well-known issues with exposing uninitialised state, and thereby breaking encapsulation guarantees (which is an important feature of Wasm's module system). That said, I believe that all discussed cases can be handled by having the module export an explicit init function. The start function isn't for these cases, but is that a problem? |
The only problem I see is that often seems to confuse people. I don't feel strongly about it but perhaps since the name is part of the binary format, we can still rename it to something else like "init" in the text format and in the spec text? |
I'd be fine with renaming in the spec. Likewise for the text format, but that might be a bit more controversial. We could allow both keywords. |
the start function is specified to run at normal evaluation time in the esm proposal. additionally, exporting something explicit doesn't work for esm at all because nothing is there to import and run it. I'm really confused at this point. is the esm integration intended to exist at all?
even if you do that, random functions a wasm module imports can't magically access that export. |
The ESM integration gives the illusion of cyclic linking by wrapping JS stubs around Wasm exports that don't yet exist. But they throw if invoked too early. Only once the Wasm module is instantiated underneath these stubs become usable. So even with ESM, Wasm module initialisation is not reentrant or cyclic. That's a feature. :) Why would importing a custom initialisation function and explicitly invoking it after linking not work? |
are you suggesting something like this? let id = 0;
const memories = {};
export function register_memory(mem) {
const i = id++;
memories[i] = mem;
return i;
}
export function fd_write(id, fd, ptr, len) {
const mem = memories[id];
// ...
} you'd have to make memory a first class value, and at that point you can just do this: export function fd_write(mem, fd, ptr, len) {
// ...
} |
We thought about doing this, but when it came down to details, that ended up being complex and unworkable. So these stubs do not exist in the current proposal. |
@devsnek Thinking about WASI in particular (which based on the code samples, I think you are?): I think we need to separate the "current unstable" and "future (with reftypes and multi-memory)" cases:
|
@lukewagner its not possible to get access to any of the memory using esm, whether or not its the 0th one. i'm using wasi as a motivating example, but the issue extends much further. |
Ah, okay, is it just uninitialised "live" bindings then? Either way, the effect should be almost the same.
The exports of a module are filled in after the start function has completed. Before you only have the bindings, but they are yet undefined. But I believe that's good enough for those use cases, because there's no need to access the memory yet. |
@devsnek It is, if the module exports it.
By the time the start function executes, the instance has to be fully live/constructed, so I don't see any technical reason that the exports couldn't be written eagerly into their live bindings. I actually vaguely recall discussing this in the past which is why I thought that's what was already specified.
It really depends what people want to do during initialization, but it's not too hard to imagine the need to call out to JS for some glue utility and for that JS glue to need the exported memory to implement its functionality. |
@lukewagner if I export a function intended to be consumed by wasm (from wasm or js), how does that function access the memory of the calling wasm? |
@devsnek The same module exporting the function should also export its memory and the same module importing the function should also import the memory. It's not ideal, of course, but as explained above this isn't the goal state. |
@lukewagner the file with the function doesn't know which wasm module is consuming it, it's not a thing esm provides. (it also doesn't know the name of the exported memory, but that's less relevant) |
True, but only if the modules chooses to initialise an external table that way. This is like handing out
But that is an impossible requirement in general, since you cannot even do the most trivial computation before the start function, like, adding two numbers.
I agree. But why is it necessary to use the start function in such cases? |
Yes, but the same argument applies here: if the module chooses to call an import during the start function, it's on that module to make sure that's a valid thing to do.
I'm saying that the spec/engine-created instance has to be fully functional, not that the application-level state has to be fully initialized.
As an implementation detail of the wasm code that implements the start function. Consider the polyfill of what would otherwise be a builtin-module (implemented by the host): there are plenty of reasons why the wasm start function could need to use builtin functionality, and thus plenty of reasons to call the polyfill. |
I think these are quite different categories. Making a call is something that is unavoidable in general. Using an active segment is entirely avoidable, at least now that the bulk proposal fixes that. As in OO: calling a function from a constructor is the norm, passing out
Now I'm confused. So far I thought we share the goal that modules should enable proper encapsulation of application-level state. Being able to initialise that state reliably is an essential part of this. What alternative are you suggesting?
Not sure I see any difference between built-in and regular modules, they should have the same access. If we conclude that it is, then we should enable passing out references explicitly for modules that desire to do so. Breaking the encapsulation guarantees for all modules that merely need to invoke some function is not a good alternative. It would be a breaking change as well. (Or we introduce a second form of start function that runs afterwards. But that brings us back to the question why an export isn't enough.) |
I think the difference here is that we're seeing start function as being able to do a lot more than just set up internal state, but it seems that's just not the intention? perhaps we can rename start to init and then add a new start entry that runs after exports? ideally we could just allow it to be reordered and keep one... I don't think it's insane to require a module to keep its own encapsulation after it starts running arbitrary code. |
@rossberg Ok, I think I'm seeing your point more now. I was viewing the choice of Also, my initial thinking was: since the general ESM design fills in live bindings super-early, the precedent is already established and we should do likewise with wasm ESM-integration. However, I now see a counterpoint: it would be cool if (with the right bindings...) a single
In the context of ESM-integration, the start function is analogous to a JS module's top-level script and thus if you want to write an ESM in, say, Rust, and run a particular Rust function when a JS module's top-level script would run (which is a reasonable thing to want to do if you are trying to interoperate cleanly with the ESM ecosystem), you must (with the current design) run that function during the start function; if you put that code in an export, you have no control over when it's run. Combining these two requires, I like this idea (which actually matches WASI) of designating a special export as being automatically called by ESM-integration after instantiation (and thus the start function) completes and exports are made live. So the question is how to designate this function. One option is a custom section (validated and semantically meaningful on hosts that support it, similar to what's proposed for bindings) that designates the export by export field name. (This would make it quite easy to polyfill in terms of the current JS API without any binary rewriting.) Another option is to pick a magic export name in the subset of Unicode that is not valid as an ESM export identifier (because all export fields are valid in the JS API, this would also be polyfillable), but maybe that is future-hazardous so the former is better. Either way, WASI should use the same way to designate its |
Have the conventions been worked out for how WASI will be statically linked? Surely static linking of a WASI library as part of a larger application would not want to run the start function of that imported WASI file? Or would it? From a JS integration perspective statically including WASI libraries seems like a really important use case to work out with these considerations, where you wouldn't necessarily want a "run to completion" process model on import, and you want to still maintain the benefits of memory isolation by using the ES module integration. |
@guybedford I don't quite follow what you mean by "statically linking WASI". WASI is an interface, defined as a built-in module. Thus, at static link time, uses of WASI by a wasm module show up as module imports (of a WASI built-in module) that are expected to be resolved by the host, somehow, at module instantiation time. Now the host may choose to implement WASI with some JS or wasm module (e.g., on the Web, using an import-map), but that's a host-specific, instantiation-time detail. |
@littledan What do you think of the above proposed addition to ESM-integration? This might also be a good in-person CG discussion topic during the ESM-integration slot. |
Apologies for my delay. The above idea sounds good to me: we have both the current start function, called before exports are visible, and another function which is called after exports are exposed. Seems like this would meet all of the goals described above. I like how this change is purely local, not requiring a second traversal over the module graph (in addition to not requiring binary rewriting). Bikeshedding: We could call this I will include this in my presentation to the Wasm F2F and get a spec PR out ASAP. Shout-out to current implementers in Rollup, Webpack and Node.js that this change is under consideration: @sokra @guybedford @MylesBorins (I intend to follow up with details in your repositories once we make a decision). |
In the Ethereum use case (ewasm) we have the same problem and our way of solving this is having an exported function named Would it be possible extending the
|
Studing this problem a bit more, I can see that start functions are more complex than what I assumed in #1160 (comment), since:
From here, I think we need to do a more careful study of how various toolchains solve this problem, rather than assuming that a simple solution would as proposed above would work. If you could give pointers to your system's module initialization path, or ideally, summarize more details, in WebAssembly/esm-integration#32, it would be really helpful. |
This continues to confuse folks: https://bugs.llvm.org/show_bug.cgi?id=42713 |
(Returning to this because I think it would be quite valuable to standardize in wasm so as to avoid the current ad hoc embedding conventions.) One way to generalize the Node.js issue @littledan pointed out is: what should the JS API do? If the point of this One idea is that these instantiation functions can take a callback that is passed the just-created exports object and called before the new WebAssembly.Instance(compiled, reflect.imports, { expose(exports) {
for (const expt of Object.keys(exports))
reflect.exports[expt].set(exports[expt]);
if (usesWasi && exports.memory)
wasi.setMemory(exports.memory);
}); Thinking about this from an engine impl pov, this @littledan @binji WDYT? |
i suggested this callback idea somewhere more recently, i was under the impression that the plan to fix this was to support passing live slices of memory ( |
Great question: passing live slices would indeed avoid the need for the |
I like this line of investigation, and it seems like that could solve this particular issue with WASI but I don't really feel confident answering it in the air. I think, to proceed to make this a standard rather than a WASI-specific loader, it'd be best if we could do a survey of Wasm toolchains to figure out their initialization patterns, and see if this would solve their problems. I don't have time to do this sort of survey right now, personally. |
The Else I don't see any limitation of the |
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. \#### Related activities In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. \#### Considered alternatives It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. \#### Process We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. Related activities ------------------ In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. Considered alternatives ----------------------- It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. Process ------- We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
…203) * Add a toolchain-independent ABI document, and propose `_initialize` The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. Related activities ------------------ In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. Considered alternatives ----------------------- It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. Process ------- We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section * Rename to "Basic Module ABI". * Update BasicModuleABI.md Co-authored-by: Derek Schuff <dschuff@chromium.org> * Explain when we can and can't use the wasm start function. --------- Co-authored-by: Derek Schuff <dschuff@chromium.org>
We've been experimenting with using the start section in the C/C++/clang/lld world:
https://reviews.llvm.org/D40559
I've run into an issue with exported memory and static initializes (or any code in the start function). Basically no JS function that requires access to memory can be run during start because the exported memory is not available until after the start function completes.
We can switch to importing memory but that doesn't seem like a reasonable requirement.
Am I missing something?
The text was updated successfully, but these errors were encountered: