Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limitations of start function with exported memory #1160

Open
sbc100 opened this issue Nov 30, 2017 · 57 comments
Open

Limitations of start function with exported memory #1160

sbc100 opened this issue Nov 30, 2017 · 57 comments

Comments

@sbc100
Copy link
Member

sbc100 commented Nov 30, 2017

We've been experimenting with using the start section in the C/C++/clang/lld world:
https://reviews.llvm.org/D40559

I've run into an issue with exported memory and static initializes (or any code in the start function). Basically no JS function that requires access to memory can be run during start because the exported memory is not available until after the start function completes.

We can switch to importing memory but that doesn't seem like a reasonable requirement.

Am I missing something?

@jfbastien
Copy link
Member

Reading the linked review, it isn't clear to me what you're trying to execute in start. Is it everything before main, but excluding main, or up to and including main? FWIW I don't think start should call main, or that was the original idea for start anyways.

Regardless, it seems like segments can do all of the C static global initializers just fine, but the non-trivial C++ global initializers can't generally be done because they might execute arbitrary user code including syscalls. That's indeed troublesome!

Do you think the problem is only embedder calls trying to use an exported memory, or do you think these calls might also try to re-enter WebAssembly? I think it's a simple enough thing to do if the fix is to export memory before start, then export everything else after. If we instead export everything before start then I'm not sure...

At the same time, start is meant to model what ES6 module's "evaluate" does. In a world where WebAssembly can participate in ES6 modules, what should "evaluate" / start do w.r.t. exports?

@NWilson
Copy link

NWilson commented Nov 30, 2017

What do you mean by "is not available" during start? Presumably the Memory that is created for import could (in principle) be accessed via a closure while start is executing. Or, is there some annoying check that means you can't read/write to the Memory until it's attached to a fully-instantiated Instance?

I agree that it makes syscall implementation awkward - I thought of this problem too while I was trying to implement some syscalls for upstream Musl. I wasn't too bothered by the Memory issue, but the problem of imports seems thornier - until the Instance is created (at the end of start) there really isn't any way for imported functions to call back into Wasm. In practice, it's OK, but it could cause problems for someone.

The ideal thing would have been that "this" refers to the partially-instantiated Instance if imports are invoked during start, so that an imported function can call exports during start via "this.exports.blah". But that opens up a whole can of safety issues, while the current approach is safe albeit a bit limiting.

For syscalls, it shouldn't matter much. By definition, they are in "kernel-mode" and thus don't need to call anything in "user-mode" (Wasm). Most of the syscalls ought to be able to their thing without touching the Module/Instance.

@jfbastien - in the LLVM review, there's no particular opinion on what runs during start. Currently LLD's default entrypoint is _start (ie run main as well). I agree that's not ideal, I'm hoping that we could change the default to something else like _start_wasm to run everything before main. It's up to whoever runs the compiler which entrypoint you set, and regarding Sam's issue here I don't think it matters whether main is run or not.

@lukewagner
Copy link
Member

@sbc100 Yes, that does seem problematic in the near term. ES Modules do allow cyclic imports and so, if we had ES Module integration, the JS called by wasm could be an ES module that imported the calling wasm module to get the Memory.

@sbc100
Copy link
Member Author

sbc100 commented Nov 30, 2017

From my POV the problem would be solved by giving the embedder access to the newly created memory before start is called. @jfbastien its not clear to me how you would propose doing this. Can you explain?

In terms of what we will use the start section for, the jury is still out on that. What I've been looking at is moving the C/C++ static init function there, and leaving main to be run explicitly by the embedder. But this issue exists whatever we choose to run there.

@jfbastien
Copy link
Member

From my POV the problem would be solved by giving the embedder access to the newly created memory before start is called. @jfbastien its not clear to me how you would propose doing this. Can you explain?

I think that what we end up doing should match what ES6 modules do: first they link, then evaluate. This can be circular! In our implementation we use the ES6 module machinery, we first link here, resolving exports and setting the corresponding symbol table. After all the ES6 modules are linked (there's only one for WebAssembly at the moment) there's a call to evaluate which sets up element and segment, and at the very end we call the start function. That's the same as running an ES6 module's global code just before returning to the import call (and this is done in an order that's determined by link IIUC).

Given this, what I think needs to happen is that we make export availability match what ES6 modules do for their own exports during evaluate. I'd consult with @domenic or @Constellation about this.

I think this also happens to fix what you're trying to do.

@sbc100
Copy link
Member Author

sbc100 commented Nov 30, 2017

But from the JS API POV, how could we allow the developer to get a handle to the new memory before start is called? I was under the impression that the instance would need to be returned in order to get the memory handle, but that the start function is guaranteed to be called before then.

As a workaround I could switch all the test cases to import their memory.. which is probably not a bad idea, but obviously others could still run into this issue.

@rossberg
Copy link
Member

rossberg commented Dec 5, 2017

I'm a bit confused. If you need two-phase initialisation where some init function has to be invoked after the instance has been fully constructed and handed out, then what keeps you from using an ordinary function for that? Or asking the other way round, how would the semantics of a start function whose execution you'd have to initiate explicitly be different from just invoking a normal function?

@sbc100
Copy link
Member Author

sbc100 commented Dec 5, 2017

@rossberg using a normal function invocation is the alternative option and its what we currently do. Its just that that means our tooling conventions (and clang/lld generated code) won't be able to take advantage of the wasm start section. I don't really feel strongly about using the wasm start section, but I just wanted to raise the issue here in case people see it as problematic that C/C++ won't be using it.

@sbc100
Copy link
Member Author

sbc100 commented May 1, 2018

This came up again in the llmv bug tracker recently: https://bugs.llvm.org/show_bug.cgi?id=37198

@sbc100
Copy link
Member Author

sbc100 commented May 17, 2019

This is coming up again with WASI: WebAssembly/WASI#19

@devsnek
Copy link
Member

devsnek commented May 18, 2019

this isn't just a start function problem. if you declare a function in js or wasm, and wasm imports it (esm), that function has no way to get access to the memory of that wasm module.

@sbc100
Copy link
Member Author

sbc100 commented May 20, 2019

This issue relates to the case when the wasm memory is exported from the module. In the case when the wasm memory is imported into the module then is seems reasonable to assume that the embedder has access to the memory since it would have needed it in order to instantiate the module in the first place.

There is third case where the memory is neither imported nor exported but of course there is no way to share the memory in that case.

@sbc100
Copy link
Member Author

sbc100 commented May 20, 2019

However I agree this issue should probably be renamed since it relates not just to memory but to anything that wasm module might want to export.

@rossberg
Copy link
Member

rossberg commented May 20, 2019

In retrospect we should have named the start function "init function", to indicate that its purpose is module initialisation, i.e., that the module isn't intended to be accessible before that initialisation is complete. Allowing reentrancy during initialisation would bring about the well-known issues with exposing uninitialised state, and thereby breaking encapsulation guarantees (which is an important feature of Wasm's module system).

That said, I believe that all discussed cases can be handled by having the module export an explicit init function. The start function isn't for these cases, but is that a problem?

@sbc100
Copy link
Member Author

sbc100 commented May 20, 2019

The only problem I see is that often seems to confuse people.

I don't feel strongly about it but perhaps since the name is part of the binary format, we can still rename it to something else like "init" in the text format and in the spec text?

@rossberg
Copy link
Member

rossberg commented May 20, 2019

I'd be fine with renaming in the spec. Likewise for the text format, but that might be a bit more controversial. We could allow both keywords.

@devsnek
Copy link
Member

devsnek commented May 20, 2019

the start function is specified to run at normal evaluation time in the esm proposal. additionally, exporting something explicit doesn't work for esm at all because nothing is there to import and run it. I'm really confused at this point. is the esm integration intended to exist at all?

This issue relates to the case when the wasm memory is exported from the module.

even if you do that, random functions a wasm module imports can't magically access that export.

@rossberg
Copy link
Member

The ESM integration gives the illusion of cyclic linking by wrapping JS stubs around Wasm exports that don't yet exist. But they throw if invoked too early. Only once the Wasm module is instantiated underneath these stubs become usable. So even with ESM, Wasm module initialisation is not reentrant or cyclic. That's a feature. :)

Why would importing a custom initialisation function and explicitly invoking it after linking not work?

@devsnek
Copy link
Member

devsnek commented May 20, 2019

@rossberg

Why would importing a custom initialisation function and explicitly invoking it after linking not work?

are you suggesting something like this?

let id = 0;
const memories = {};
export function register_memory(mem) {
  const i = id++;
  memories[i] = mem;
  return i;
}

export function fd_write(id, fd, ptr, len) {
  const mem = memories[id];
  // ...
}

you'd have to make memory a first class value, and at that point you can just do this:

export function fd_write(mem, fd, ptr, len) {
  // ...
}

@littledan
Copy link

The ESM integration gives the illusion of cyclic linking by wrapping JS stubs around Wasm exports that don't yet exist.

We thought about doing this, but when it came down to details, that ended up being complex and unworkable. So these stubs do not exist in the current proposal.

@lukewagner
Copy link
Member

@devsnek Thinking about WASI in particular (which based on the code samples, I think you are?): I think we need to separate the "current unstable" and "future (with reftypes and multi-memory)" cases:

  • For the current unstable case, all offsets are relative to the 0th memory, so you just need the instance to export its default memory (just with a plain memory export statement). IIRC, a wasm module's live exports are filled in before the start function is executed, so if you have a mini-cycle between JS glue code imported and called by wasm that (cyclicly) imports that wasm's memory, then by the time the wasm start function runs and calls the JS glue, the wasm's exported memory's live binding has been filled in. But maybe I'm out of date; @littledan correct me if I'm wrong.
  • For the future case, I think reads/writes from/to linear memory should be expressed in terms of Web IDL Bindings (or something analogous), where ranges of bytes would be passed via some abstract "buffer"/"slice" data type (used in the interface) that get created with the view/copy binding operators (which select their memory via a memory index immediate in the binding expression). Then the receiver of the buffer/slice doesn't have to worry about finding the right memory to use.

@devsnek
Copy link
Member

devsnek commented May 20, 2019

@lukewagner its not possible to get access to any of the memory using esm, whether or not its the 0th one. i'm using wasi as a motivating example, but the issue extends much further.

@rossberg
Copy link
Member

@littledan:

that ended up being complex and unworkable. So these stubs do not exist in the current proposal.

Ah, okay, is it just uninitialised "live" bindings then? Either way, the effect should be almost the same.

@lukewagner:

IIRC, a wasm module's live exports are filled in before the start function is executed

The exports of a module are filled in after the start function has completed. Before you only have the bindings, but they are yet undefined. But I believe that's good enough for those use cases, because there's no need to access the memory yet.

@lukewagner
Copy link
Member

@devsnek It is, if the module exports it.

@rossberg

The exports of a module are filled in after the start function has completed. Before you only have the bindings, but they are yet undefined

By the time the start function executes, the instance has to be fully live/constructed, so I don't see any technical reason that the exports couldn't be written eagerly into their live bindings. I actually vaguely recall discussing this in the past which is why I thought that's what was already specified.

But I believe that's good enough for those use cases, because there's no need to access the memory yet.

It really depends what people want to do during initialization, but it's not too hard to imagine the need to call out to JS for some glue utility and for that JS glue to need the exported memory to implement its functionality.

@devsnek
Copy link
Member

devsnek commented May 21, 2019

@lukewagner if I export a function intended to be consumed by wasm (from wasm or js), how does that function access the memory of the calling wasm?

@lukewagner
Copy link
Member

@devsnek The same module exporting the function should also export its memory and the same module importing the function should also import the memory. It's not ideal, of course, but as explained above this isn't the goal state.

@devsnek
Copy link
Member

devsnek commented May 21, 2019

@lukewagner the file with the function doesn't know which wasm module is consuming it, it's not a thing esm provides. (it also doesn't know the name of the exported memory, but that's less relevant)

@rossberg
Copy link
Member

by the time the start function is called, module elems have been written into imported tables.

True, but only if the modules chooses to initialise an external table that way. This is like handing out this to an external function from your constructor -- you're on your own if you do so. Fortunately, the bulk proposal provides better alternatives.

By the time the start function is called, the instance must be a fully operational battle station!

But that is an impossible requirement in general, since you cannot even do the most trivial computation before the start function, like, adding two numbers.

But if the wasm start function code calls JS in a syscall capacity, and that JS syscall needs the wasm's (exported) memory to do what it needs to do, then we're not actually allowing this.

I agree. But why is it necessary to use the start function in such cases?

@lukewagner
Copy link
Member

True, but only if the modules chooses to initialise an external table that way.

Yes, but the same argument applies here: if the module chooses to call an import during the start function, it's on that module to make sure that's a valid thing to do.

But that is an impossible requirement in general, since you cannot even do the most trivial computation before the start function, like, adding two numbers.

I'm saying that the spec/engine-created instance has to be fully functional, not that the application-level state has to be fully initialized.

I agree. But why is it necessary to use the start function in such cases?

As an implementation detail of the wasm code that implements the start function. Consider the polyfill of what would otherwise be a builtin-module (implemented by the host): there are plenty of reasons why the wasm start function could need to use builtin functionality, and thus plenty of reasons to call the polyfill.

@rossberg
Copy link
Member

Yes, but the same argument applies here: if the module chooses to call an import during the start function, it's on that module to make sure that's a valid thing to do.

I think these are quite different categories. Making a call is something that is unavoidable in general. Using an active segment is entirely avoidable, at least now that the bulk proposal fixes that.

As in OO: calling a function from a constructor is the norm, passing out this or similar is usually advised against (and even made impossible in some languages).

I'm saying that the spec/engine-created instance has to be fully functional, not that the application-level state has to be fully initialized.

Now I'm confused. So far I thought we share the goal that modules should enable proper encapsulation of application-level state. Being able to initialise that state reliably is an essential part of this. What alternative are you suggesting?

As an implementation detail of the wasm code that implements the start function. Consider the polyfill of what would otherwise be a builtin-module (implemented by the host)

Not sure I see any difference between built-in and regular modules, they should have the same access.
But regardless, you are describing something that I call a two-phase initialisation protocol. That is not an implementation detail anymore, because it involves state changes on the other side (at the minimum, storing some references). I agree that the start function alone isn't enough to express that, but I don't think that is a problem either.

If we conclude that it is, then we should enable passing out references explicitly for modules that desire to do so. Breaking the encapsulation guarantees for all modules that merely need to invoke some function is not a good alternative. It would be a breaking change as well. (Or we introduce a second form of start function that runs afterwards. But that brings us back to the question why an export isn't enough.)

@devsnek
Copy link
Member

devsnek commented May 23, 2019

I think the difference here is that we're seeing start function as being able to do a lot more than just set up internal state, but it seems that's just not the intention?

perhaps we can rename start to init and then add a new start entry that runs after exports? ideally we could just allow it to be reordered and keep one... I don't think it's insane to require a module to keep its own encapsulation after it starts running arbitrary code.

@lukewagner
Copy link
Member

@rossberg Ok, I think I'm seeing your point more now. I was viewing the choice of a.wasm to import and call b.js during a.wasm's start function as an explicit choice to treat b.js as an extension of a.wasm's implementation. But actually, answering the below question helped me see that while that is sometimes the case (viz., generated JS glue code), it's not always the case and thus we can't really consider this "explicit".

Also, my initial thinking was: since the general ESM design fills in live bindings super-early, the precedent is already established and we should do likewise with wasm ESM-integration. However, I now see a counterpoint: it would be cool if (with the right bindings...) a single .wasm could run just as well in an ESM-loading host as any other host; if we allow exports to go live pre-start in the ESM world, that will create hazards for interoperable tooling. Case and point, WASI adopted an explicit exported _start function (to sidestep all these issues).

(Or we introduce a second form of start function that runs afterwards. But that brings us back to the question why an export isn't enough.)

In the context of ESM-integration, the start function is analogous to a JS module's top-level script and thus if you want to write an ESM in, say, Rust, and run a particular Rust function when a JS module's top-level script would run (which is a reasonable thing to want to do if you are trying to interoperate cleanly with the ESM ecosystem), you must (with the current design) run that function during the start function; if you put that code in an export, you have no control over when it's run.

Combining these two requires, I like this idea (which actually matches WASI) of designating a special export as being automatically called by ESM-integration after instantiation (and thus the start function) completes and exports are made live.

So the question is how to designate this function. One option is a custom section (validated and semantically meaningful on hosts that support it, similar to what's proposed for bindings) that designates the export by export field name. (This would make it quite easy to polyfill in terms of the current JS API without any binary rewriting.)

Another option is to pick a magic export name in the subset of Unicode that is not valid as an ESM export identifier (because all export fields are valid in the JS API, this would also be polyfillable), but maybe that is future-hazardous so the former is better.

Either way, WASI should use the same way to designate its _start function so that WASI modules could do the Right Thing(tm) in an ESM-loading host.

@guybedford
Copy link

Have the conventions been worked out for how WASI will be statically linked? Surely static linking of a WASI library as part of a larger application would not want to run the start function of that imported WASI file? Or would it? From a JS integration perspective statically including WASI libraries seems like a really important use case to work out with these considerations, where you wouldn't necessarily want a "run to completion" process model on import, and you want to still maintain the benefits of memory isolation by using the ES module integration.

@lukewagner
Copy link
Member

@guybedford I don't quite follow what you mean by "statically linking WASI". WASI is an interface, defined as a built-in module. Thus, at static link time, uses of WASI by a wasm module show up as module imports (of a WASI built-in module) that are expected to be resolved by the host, somehow, at module instantiation time. Now the host may choose to implement WASI with some JS or wasm module (e.g., on the Web, using an import-map), but that's a host-specific, instantiation-time detail.

@lukewagner
Copy link
Member

lukewagner commented May 29, 2019

@littledan What do you think of the above proposed addition to ESM-integration? This might also be a good in-person CG discussion topic during the ESM-integration slot.

@littledan
Copy link

littledan commented Jun 11, 2019

Apologies for my delay.

The above idea sounds good to me: we have both the current start function, called before exports are visible, and another function which is called after exports are exposed. Seems like this would meet all of the goals described above. I like how this change is purely local, not requiring a second traversal over the module graph (in addition to not requiring binary rewriting).

Bikeshedding: We could call this *start*, which is quite invalid as an ES identifier, while being typeable on many keyboards easily.

I will include this in my presentation to the Wasm F2F and get a spec PR out ASAP. Shout-out to current implementers in Rollup, Webpack and Node.js that this change is under consideration: @sokra @guybedford @MylesBorins (I intend to follow up with details in your repositories once we make a decision).

@axic
Copy link

axic commented Jun 12, 2019

In the Ethereum use case (ewasm) we have the same problem and our way of solving this is having an exported function named main. We'd be very much in favour of following a standard agreed here and not inventing our own version.

Would it be possible extending the StartSection to include two start functions:

  • pre-finalisation (aka pre-exports are loaded) – this is the current "start" function
  • post-finalisation – this would be the "new start" function

@littledan
Copy link

Studing this problem a bit more, I can see that start functions are more complex than what I assumed in #1160 (comment), since:

From here, I think we need to do a more careful study of how various toolchains solve this problem, rather than assuming that a simple solution would as proposed above would work. If you could give pointers to your system's module initialization path, or ideally, summarize more details, in WebAssembly/esm-integration#32, it would be really helpful.

@sbc100
Copy link
Member Author

sbc100 commented Aug 7, 2019

This continues to confuse folks: https://bugs.llvm.org/show_bug.cgi?id=42713

@lukewagner
Copy link
Member

(Returning to this because I think it would be quite valuable to standardize in wasm so as to avoid the current ad hoc embedding conventions.)

One way to generalize the Node.js issue @littledan pointed out is: what should the JS API do? If the point of this start_ (better name TBD) function is to be called immediately after exports are visible and WebAssembly.instantiate() or new WebAssembly.Instance() simply calls start_ right after start and before returning the new instance to the caller, the exports are not in fact yet visible.

One idea is that these instantiation functions can take a callback that is passed the just-created exports object and called before the start_ function. So, looking at the Node.js example, you could write:

new WebAssembly.Instance(compiled, reflect.imports, { expose(exports) {
  for (const expt of Object.keys(exports))
    reflect.exports[expt].set(exports[expt]);

  if (usesWasi && exports.memory)
    wasi.setMemory(exports.memory);
});

Thinking about this from an engine impl pov, this expose callback is actually symmetric to how we'd implement ESM-integration: the engine's wasm::Module::instantiate routine would take a native callback that was called between start and start_ and the ESM loader would use this callback to make the exports live in their respective ESM bindings. So really, we'd just be exposing this ESM-integration callback to JS when the lower-level JS API was used.

@littledan @binji WDYT?

@devsnek
Copy link
Member

devsnek commented Mar 5, 2020

i suggested this callback idea somewhere (i honestly have no idea where, it was a while ago) and reasons were presented that it was suboptiomal. i'm trying to find it but i haven't yet. (found it, WebAssembly/WASI#82, on the subject of memory init and resizing, ran into issues with threading, apparently not as relevant as i remembered :P)

more recently, i was under the impression that the plan to fix this was to support passing live slices of memory (fs_read(fd, buf) instead of fs_read(fd, u32IndexToBuf)), or whatever other things are possible to be imported/exported from wasm

@lukewagner
Copy link
Member

Great question: passing live slices would indeed avoid the need for the wasi.setMemory(). And native wasm ESM-integration would remove the rest of it, I expect. But I think the general problem of "I'm using the JS API to instantiate but I need to do something to make the exports "live" (for whatever my definition of "live" is) before the _start function is called" remains.

@littledan
Copy link

I like this line of investigation, and it seems like that could solve this particular issue with WASI but I don't really feel confident answering it in the air. I think, to proceed to make this a standard rather than a WASI-specific loader, it'd be best if we could do a survey of Wasm toolchains to figure out their initialization patterns, and see if this would solve their problems. I don't have time to do this sort of survey right now, personally.

@verbessern
Copy link

verbessern commented Apr 27, 2020

The start function may accept an argument: what to be started. Then the function name will remain. The default argument would be as the specification is defined now: lets say "START_DEFAULT" = "INIT+COMPLETE". Then if you want to break the start to few stages, have "START_INIT", "START_COMPLETE" and so on called separately.

Else I don't see any limitation of the start function as it is now, when its idea is to only initialize the basic functionality of the module for itself. Then, however, practically there is no function that will 'run' the module after its initialization is fully complete in a standard way. One run/main function could be added to facilitate that.

sunfishcode added a commit to sunfishcode/tool-conventions that referenced this issue Mar 6, 2023
The Wasm ecosystem is currently not consistent in how "constructors" such
as C++ static initializers and similar features in other languages are
implemented, and the result is users reporting constructs running multiple
times, and other users reporting constructors not getting run when they
should.

WASI has [defined a convention] using an exported function named
`_initialize`, however not all users are using WASI conventions. In
particular, users of what is sometimes called "wasm32-unknown-unknown"
are not expecting to follow WASI conventions. However, they still have a
need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as
a toolchain-independent ABI, here in tool-conventions. This would
recognize the `_initialize` function as the toolchain-independent way
to ensure that constructors are properly called before other exports are
accessed.

In the component model, there is a proposal to add a
[second initialization phase]. If that's done, then component-model
toolchains could arrange for this `_initialize` function to be called
automatically by this second initialization mechanism.

It is tempting to use the [Wasm start function] for C++ constructors;
this has been [extensively discussed], and the short answer is, the Wasm
start function is often called at a time when the outside environment
can't access the module's exports, and C++ constructors can run
arbitrary user code which may generate calls to things that need to
access the module's exports.

It's also tempting to propose defining a second initialization phase in
core Wasm. I'm not opposed to this, but it is more complex at the core
Wasm level than at the component-model level. For example, in Emscripten,
Wasm modules depend on JS code being able to run after the exports are
available but before the initialization function is called, which
wouldn't be possible if we simply call the initilaization function as
part of the linking step.

Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use
that name instead of `_initialize`, but wasm-ld already does insert some
initialization in addition to just constructors, so I think it makes
sense to use `_initialize` as the exported function, which may call
`__wasm_call_ctors` in its body.

We don't have a formal process defined for tool-convention proposals,
but because this is proposal has potentially wide-ranging impacts, I
propose to follow the following process:

 - I'm starting by posting this PR here, and people can comment on it.
   If a better alternative emerges, I'll close this PR.

 - After discussion here settles, if a better alternative hasn't emerged,
   I plan to request a CG meeting agenda item to present this topic to the
   CG, and seek feedback there, to ensure that it has CG-level visibility.

 - If the CG is in favor of it, then I'd propose we merge this PR.

[defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md
[second initialization phase]: WebAssembly/component-model#146
[Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start
[extensively discussed]: WebAssembly/design#1160
[`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
sunfishcode added a commit to sunfishcode/tool-conventions that referenced this issue Mar 6, 2023
The Wasm ecosystem is currently not consistent in how "constructors" such
as C++ static initializers and similar features in other languages are
implemented, and the result is users reporting constructs running multiple
times, and other users reporting constructors not getting run when they
should.

WASI has [defined a convention] using an exported function named
`_initialize`, however not all users are using WASI conventions. In
particular, users of what is sometimes called "wasm32-unknown-unknown"
are not expecting to follow WASI conventions. However, they still have a
need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as
a toolchain-independent ABI, here in tool-conventions. This would
recognize the `_initialize` function as the toolchain-independent way
to ensure that constructors are properly called before other exports are
accessed.

\#### Related activities

In the component model, there is a proposal to add a
[second initialization phase]. If that's done, then component-model
toolchains could arrange for this `_initialize` function to be called
automatically by this second initialization mechanism.

\#### Considered alternatives

It is tempting to use the [Wasm start function] for C++ constructors;
this has been [extensively discussed], and the short answer is, the Wasm
start function is often called at a time when the outside environment
can't access the module's exports, and C++ constructors can run
arbitrary user code which may generate calls to things that need to
access the module's exports.

It's also tempting to propose defining a second initialization phase in
core Wasm. I'm not opposed to this, but it is more complex at the core
Wasm level than at the component-model level. For example, in Emscripten,
Wasm modules depend on JS code being able to run after the exports are
available but before the initialization function is called, which
wouldn't be possible if we simply call the initilaization function as
part of the linking step.

Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use
that name instead of `_initialize`, but wasm-ld already does insert some
initialization in addition to just constructors, so I think it makes
sense to use `_initialize` as the exported function, which may call
`__wasm_call_ctors` in its body.

\#### Process

We don't have a formal process defined for tool-convention proposals,
but because this is proposal has potentially wide-ranging impacts, I
propose to follow the following process:

 - I'm starting by posting this PR here, and people can comment on it.
   If a better alternative emerges, I'll close this PR.

 - After discussion here settles, if a better alternative hasn't emerged,
   I plan to request a CG meeting agenda item to present this topic to the
   CG, and seek feedback there, to ensure that it has CG-level visibility.

 - If the CG is in favor of it, then I'd propose we merge this PR.

[defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md
[second initialization phase]: WebAssembly/component-model#146
[Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start
[extensively discussed]: WebAssembly/design#1160
[`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
sunfishcode added a commit to sunfishcode/tool-conventions that referenced this issue Mar 6, 2023
The Wasm ecosystem is currently not consistent in how "constructors" such
as C++ static initializers and similar features in other languages are
implemented, and the result is users reporting constructs running multiple
times, and other users reporting constructors not getting run when they
should.

WASI has [defined a convention] using an exported function named
`_initialize`, however not all users are using WASI conventions. In
particular, users of what is sometimes called "wasm32-unknown-unknown"
are not expecting to follow WASI conventions. However, they still have a
need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as
a toolchain-independent ABI, here in tool-conventions. This would
recognize the `_initialize` function as the toolchain-independent way
to ensure that constructors are properly called before other exports are
accessed.

Related activities
------------------

In the component model, there is a proposal to add a
[second initialization phase]. If that's done, then component-model
toolchains could arrange for this `_initialize` function to be called
automatically by this second initialization mechanism.

Considered alternatives
-----------------------

It is tempting to use the [Wasm start function] for C++ constructors;
this has been [extensively discussed], and the short answer is, the Wasm
start function is often called at a time when the outside environment
can't access the module's exports, and C++ constructors can run
arbitrary user code which may generate calls to things that need to
access the module's exports.

It's also tempting to propose defining a second initialization phase in
core Wasm. I'm not opposed to this, but it is more complex at the core
Wasm level than at the component-model level. For example, in Emscripten,
Wasm modules depend on JS code being able to run after the exports are
available but before the initialization function is called, which
wouldn't be possible if we simply call the initilaization function as
part of the linking step.

Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use
that name instead of `_initialize`, but wasm-ld already does insert some
initialization in addition to just constructors, so I think it makes
sense to use `_initialize` as the exported function, which may call
`__wasm_call_ctors` in its body.

Process
-------

We don't have a formal process defined for tool-convention proposals,
but because this is proposal has potentially wide-ranging impacts, I
propose to follow the following process:

 - I'm starting by posting this PR here, and people can comment on it.
   If a better alternative emerges, I'll close this PR.

 - After discussion here settles, if a better alternative hasn't emerged,
   I plan to request a CG meeting agenda item to present this topic to the
   CG, and seek feedback there, to ensure that it has CG-level visibility.

 - If the CG is in favor of it, then I'd propose we merge this PR.

[defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md
[second initialization phase]: WebAssembly/component-model#146
[Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start
[extensively discussed]: WebAssembly/design#1160
[`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
sunfishcode added a commit to WebAssembly/tool-conventions that referenced this issue Oct 16, 2024
…203)

* Add a toolchain-independent ABI document, and propose `_initialize`

The Wasm ecosystem is currently not consistent in how "constructors" such
as C++ static initializers and similar features in other languages are
implemented, and the result is users reporting constructs running multiple
times, and other users reporting constructors not getting run when they
should.

WASI has [defined a convention] using an exported function named
`_initialize`, however not all users are using WASI conventions. In
particular, users of what is sometimes called "wasm32-unknown-unknown"
are not expecting to follow WASI conventions. However, they still have a
need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as
a toolchain-independent ABI, here in tool-conventions. This would
recognize the `_initialize` function as the toolchain-independent way
to ensure that constructors are properly called before other exports are
accessed.

Related activities
------------------

In the component model, there is a proposal to add a
[second initialization phase]. If that's done, then component-model
toolchains could arrange for this `_initialize` function to be called
automatically by this second initialization mechanism.

Considered alternatives
-----------------------

It is tempting to use the [Wasm start function] for C++ constructors;
this has been [extensively discussed], and the short answer is, the Wasm
start function is often called at a time when the outside environment
can't access the module's exports, and C++ constructors can run
arbitrary user code which may generate calls to things that need to
access the module's exports.

It's also tempting to propose defining a second initialization phase in
core Wasm. I'm not opposed to this, but it is more complex at the core
Wasm level than at the component-model level. For example, in Emscripten,
Wasm modules depend on JS code being able to run after the exports are
available but before the initialization function is called, which
wouldn't be possible if we simply call the initilaization function as
part of the linking step.

Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use
that name instead of `_initialize`, but wasm-ld already does insert some
initialization in addition to just constructors, so I think it makes
sense to use `_initialize` as the exported function, which may call
`__wasm_call_ctors` in its body.

Process
-------

We don't have a formal process defined for tool-convention proposals,
but because this is proposal has potentially wide-ranging impacts, I
propose to follow the following process:

 - I'm starting by posting this PR here, and people can comment on it.
   If a better alternative emerges, I'll close this PR.

 - After discussion here settles, if a better alternative hasn't emerged,
   I plan to request a CG meeting agenda item to present this topic to the
   CG, and seek feedback there, to ensure that it has CG-level visibility.

 - If the CG is in favor of it, then I'd propose we merge this PR.

[defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md
[second initialization phase]: WebAssembly/component-model#146
[Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start
[extensively discussed]: WebAssembly/design#1160
[`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section

* Rename to "Basic Module ABI".

* Update BasicModuleABI.md

Co-authored-by: Derek Schuff <dschuff@chromium.org>

* Explain when we can and can't use the wasm start function.

---------

Co-authored-by: Derek Schuff <dschuff@chromium.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests