Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support static constructors (ctors) #1216

Open
daboross opened this issue Jan 30, 2019 · 12 comments
Open

Support static constructors (ctors) #1216

daboross opened this issue Jan 30, 2019 · 12 comments

Comments

@daboross
Copy link

My main motivation here is using typetag, which depends on rust-ctor to function.

rust-ctor uses link sections on Windows, Mac and Linux to allow functions to run main. Any crate can add these functions, and then they'll all be run without needing central configuration. The problem is that this depends on Windows/Mac/Linux specific link sections, and there is no equivalent for wasm32.

I'm opening this issue for discussion of static constructors (also called global or module constructors) and wasm-bindgen. . My ideal resolution would be adding support for something to wasm-bindgen, and then rust-ctor could use that support and crates like typetag would simply work.

I opened stdweb#321 earlier for the same issue, but @koute mentioned that it might be a good idea to coordinate how this is done across the ecosystem rather than having stdweb and wasm-bindgen implement separate measures.


Arbitrary discussions I've found about ctors and WASM which may or may not be relevant:


I don't have a concrete proposal, mostly just opening this up for discussion. Also, if anyone knows any prior work on ctors+rust+wasm that could be linked, I'd greatly appreciate it!

@alexcrichton
Copy link
Contributor

Thanks for the report! It looks like there's prior art to some degree in LLVM/LLD, there's discussion threaded off https://reviews.llvm.org/D40759 and such, and notably LLD has treatment for a special __wasm_call_ctors symbol.

@sunfishcode would you be able to help us out here? It looks like you may be familiar with the intentions of LLVM/LLD. This issue is about supporting static constructors (like C++ static constructors sorta), and I'm curious if there's an existing convention that Clang is going to use, and if so we should stick to that too!

@sunfishcode
Copy link

Assuming you're producing LLVM IR, the llvm.global_ctors used on other targets works for wasm too.

@llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 65535, void ()* @foo, i8* null }]

is LLVM IR to register "foo" as a static constructor. There's also an @llvm.global_dtors, which works, however static destructors are usually implemented by registering them with __cxa_atexit instead.

The linker then collects all of the static constructors and synthesizes a function named __wasm_call_ctors which calls all them all, so you'll need to arrange for that to be called before main is called.

@alexcrichton
Copy link
Contributor

Ok thanks @sunfishcode! That all makes sense

As a follow-up question, do you know why LLD doesn't inject a start function which automatically calls __wasm_call_ctors? Or were you thinking there's somewhat of a bundler convention to call this function at the right time?

@sunfishcode
Copy link

The wasm start function doesn't get called at the right time to call non-ESM JS code. See WebAssembly/design#1160 for some discussion. My understanding is that in an all-ESM context, we could use start functions, so I'd eventually like to have an option to call __wasm_call_ctors and even main automatically from the wasm start function, however that's not implemented yet.

@alexcrichton
Copy link
Contributor

Ok cool thanks for the clarifications! We perform a few transformations already to ensure that the start function, when executing, has everything properly hooked up so I think we're covered on that point!

I believe the next items to implement this feature would be:

  • First, this is a feature that'd need to be added to rustc itself. We'd either need an attribute on a function or an attribute on a static which is a function pointer to indicate that it's a global constructor for WebAssembly. This attribute would likely be unstable by default
  • Second, rustc would have to read this attribute and then emit the appropriate llvm.global_ctors value.
  • Finally, in wasm-bindgen, we'd hook up the __wasm_call_ctors function, if present, to just before the start function

@daboross
Copy link
Author

daboross commented Jan 31, 2019

Awesome! Thanks for the quick reply and addressing this.

As much as I hate to add a potential obstacle, it seems like if we're adding in support for LLVM's global constructors in rustc for WASM, it might make sense to do that further?

First, this is a feature that'd need to be added to rustc itself. We'd either need an attribute on a function or an attribute on a static which is a function pointer to indicate that it's a global constructor for WebAssembly. This attribute would likely be unstable by default

Since @llvm.global_ctors isn't WebAssembly-specific, do you think it would make sense to start an RFC for adding this on all platforms? This could add more across-the-board support for ctors than rust-ctor's current per-target approach.

Global constructors were listed as an explicit non-goal in the design FAQ when it existed, so it might not be a great idea. But they're quite useful for coordinating separate pieces code pieces without any central list, and I don't know of anything else can do that right now. If a non-ctor-based alternative to the inventory crate doesn't come up, I imagine more projects will indirectly depend on rust-ctor despite its unrustyness. Official support could avoid splitting the ecosystem between targets supported by rust-ctor and those which aren't.

I originally thought this issue would be just adding some specific support to wasm-bindgen with a wasm-bindgen-specific attribute. But if it requires modifying rustc anyways, maybe a broader discussion could be more appropriate?

CC @mmastrac from rust-ctor

(edit: added some remarks I completely forgot I wanted to say until 5 minutes after posting!)

@mmastrac
Copy link

I'm happy to add support to rust-ctor for wasm if the appropriate environmental support is available. We might be able to use #![feature(asm)] to emit the appropriate IR magic, but I'd have to investigate this further.

Ideally this feature would end up in stdlib, but if we can add support for the wasm platform to rust-ctor, this may demonstrate the value of the feature to the rustc team.

@alexcrichton
Copy link
Contributor

A good question @daboross!

Over time philosophies shift ever so slightly and such, but nowadays I think it'd be fine to roughly add something like this. We don't want to hide features in rustc that a platform has, and as @mmastrac has shown this is already somewhat supported on platforms today!

I think, though, we probably won't be able to add a first-class feature using @llvm.global_ctors. Most LLVM features like that tend to work well-ish across most platforms, but there's always the odd-few-out that don't work. For example NVPTX or probably bare-metal ARM likely don't support global constructors.

That's ok though! We have a long history of providing features here and there that "do their best" to work on all platforms but are clearly documented as "you shouldn't absolutely rely on this to work everywhere". Such a feature is fine to add and support.

What all that boils down to is:

  • We should investigate the platform portability of llvm.global_ctors on non-WebAssembly platforms. For exampe does this just lower to what @mmastrac is already doing? Or maybe there's more fiddly bits involved?
  • In any case having a wasm-specific attribute for this convention should be fine. It'd be unstable to start and we could possibly unify it with other platforms if workable.

@daboross
Copy link
Author

Thanks for following up on everything so quickly! I've been a bit distracted, but I'm going to try and see what I can do to move forward with this.

I've looked a bit more into what LLVM does with @llvm.global_ctors, but I don't think I have enough background knowledge to find anything useful just searching. I'm not 100% sure if clang uses it, or if it has its own logic for static initializers?

Separate from that, I wrote up my current thoughts on the design in a pre-RFC. I've described it as a cross-platform attribute, but could easily be downgraded to WASM-only. This might not be at all useful in the long run given how little research I've done into the actual implementation - I mainly wanted to get some thoughts into nicer words before trying to do more with it.


I'm planning on diving into rustc to see if I can take this any further in that direction. If nothing else, hopefully hacking llvm.global_ctors into the llvm ir output could give insight into portability and differences with mmastrac's current approach.

If anyone else is interested in this part, though, feel free to step in. I will by no means be moving quickly.

@Frizi
Copy link

Frizi commented Sep 19, 2019

I was just playing with it to confirm that it would actually work on wasm lld. I was able to link some additional llvm-ir into the wasm binary with build.rs script.

Unfortunately, I still had to call __wasm_call_ctors at the beginning of start manually. This would indeed just need to be supported by bindgen.

First, I created this IR

target triple = "wasm32-unknown-unknown"

declare hidden void @call_ctor()
declare hidden void @call_ctor2()

@llvm.global_ctors = appending global [2 x { i32, void ()*, i8* }] [
  { i32, void ()*, i8* } { i32 4000, void ()* @call_ctor, i8* null },
  { i32, void ()*, i8* } { i32 4000, void ()* @call_ctor2, i8* null }
]

Then, compiled it with llc -filetype=obj define_ctors.ll -o define_ctors.o
And linked it into the binary

// build.rs
use std::env;
fn main() {
    let project_dir = env::var("CARGO_MANIFEST_DIR").unwrap();
    println!("cargo:rustc-link-search={}", project_dir);
    println!("cargo:rustc-cdylib-link-arg=define_ctors.o");
}

Then, the following programs successfully calls both ctor functions at the beginning of main.

#[no_mangle]
pub fn call_ctor() {
    log("Hello, ctor!");
}

#[no_mangle]
pub fn call_ctor2() {
    log("Hello, ctor2!");
}

extern "C" {
    fn __wasm_call_ctors();
}

#[wasm_bindgen(start)]
pub fn main() {
    unsafe { __wasm_call_ctors() };
    log("Hello, world!");
}

This should work just fine for ctors defined in other crates as well, as long as the __wasm_call_ctors call is injected into the module entry point.

Of course having to prepare ctors list manually is kinda pointless, so we need to find a way to generate that from rust directly. If generating the object files from build.rs of every crate is fine, then this could be automated with build scripts reading macros similarily to cpp crate. In fact, it should be then possible to use cpp crate and generate the @llvm.global_ctors symbol with cpp compiler 😄

@wigy-opensource-developer

Okay people. Now that this issue is silent for more than a year and we need to do platform-specific hacks in the WASM binding layer, I could not hold myself back to give my 2 satoshis.

I love the generic idea in the Rust Design FAQ that static constructors and destructors are things to avoid, because they complicate lifetime guarantees otherwise checked by Rust. If you check how most libraries use the rust-ctor crate, they are implementing some kind of dependency injection with it.

These techniques solve a single problem: an application crate does not need to depend on each library crate that has structs implementing a trait the application uses. (the application crate can therefore be compiled earlier than thos library crates)

So if the rustc developers do not want to complicate the design with the static ctors/dtors (which decision I would completely understand and support), applications need another solution to

  • load library crates into memory
  • and reflect over the loaded crate so they can find all structs implementing a given trait

In that sense, the functions start() or more specifically __wasm_call_ctors() in WASM can be treated as an application entry point that should be able to enumerate some symbols in the loaded code. This approach seems to go against the "no runtime" design decision, but optional reflection data should not affect performance of the running code.

@ghost
Copy link

ghost commented Jan 1, 2021

I may be wrong since I'm not an LLVM expert: according to this, LLVM uses .init_array on WASM, which is same as ELF platforms. So I tried to do the same thing in Rust, but got this error:

error: statics with a custom `#[link_section]` must be a simple list of bytes on the wasm target with no extra levels of indirection such as references

By searching rustc's source code, I found this: https://github.com/rust-lang/rust/blob/f8ab56bf3201b0638e44caf5a484041f22e32d65/compiler/rustc_typeck/src/check/mod.rs#L760-L774

That concern seems unnecessary at least for .init_array (I'm not pretty sure, though): https://github.com/llvm/llvm-project/blob/d5324c052b21741d8d9f980d796604589b85c97a/llvm/lib/MC/WasmObjectWriter.cpp#L452-L456

EDIT: I tried to bypass the check using a custom JSON target but hit this:

LLVM ERROR: only one .init_array section fragment supported

I have no idea what did that mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants