Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to ask emcc to pre-reserve a bunch of slots in the function table? #20097

Closed
kg opened this issue Aug 22, 2023 · 16 comments · Fixed by #20149
Closed

Way to ask emcc to pre-reserve a bunch of slots in the function table? #20097

kg opened this issue Aug 22, 2023 · 16 comments · Fixed by #20149

Comments

@kg
Copy link

kg commented Aug 22, 2023

Sorry if this already exists, I grepped the docs + source + issues and couldn't find anything.

At application startup I'm growing the function table and then filling it with a bunch of safe placeholder functions so that I can swap new ones in dynamically at runtime (pre-allocating it like this is necessary for thread safety). The grow operation and Table.set are both very expensive in v8, though, so it would be cool if there was some way to ask emcc to reserve a set number of slots for me (so I can count on them being there and being usable). If I could also specify what to populate them with that would be even better. Would emcc be able to do either of these things via a switch or other some sort of configuration?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 22, 2023

The llvm linker (wasm-ld) doesn't currently have any way to over-allocate the table and I'm not sure it would make sense to add one.

Another approach would be to add an IMPORTED_TABLE options, similar to the exists IMPORTED_MEMORY option that folks can use to control how and when the memory is created:

emscripten/src/settings.js

Lines 1997 to 2007 in ef3e4e3

// Set to 1 to define the WebAssembly.Memory object outside of the wasm
// module. By default the wasm module defines the memory and exports
// it to JavaScript.
// Use of the following settings will enable this settings since they
// depend on being able to define the memory in JavaScript:
// - -pthread
// - RELOCATABLE
// - ASYNCIFY_LAZY_LOAD_CODE
// - WASM2JS (WASM=0)
// [link]
var IMPORTED_MEMORY = false;

We already have code for creating the table on the JS side but its currently only enabled for dynamic linking:

#if RELOCATABLE
// In RELOCATABLE mode we create the table in JS.
var wasmTable = new WebAssembly.Table({
'initial': {{{ INITIAL_TABLE }}},
#if !ALLOW_TABLE_GROWTH
'maximum': {{{ INITIAL_TABLE }}},
#endif
'element': 'anyfunc'
});
#else
// In regular non-RELOCATABLE mode the table is exported
// from the wasm module and this will be assigned once
// the exports are available.
var wasmTable;
#endif

If we added IMPORTED_TABLE then you could set IMPORTED_TABLE to whatever value you wanted. Or you could create your own table and pass it in via Module['wasmTable']. However in both cases one issue that you would face is that you would have to take into account both the empty slots you need and the slots needed by the core module itself. To find out where the code module slots when you would either need to part the module, or scan the table on startup.

@sbc100
Copy link
Collaborator

sbc100 commented Aug 22, 2023

Also, when you say "Table.grow" is slow.. is that only when its called in increments of 1? Presumably you could also just do table.grow(N) on startup to perform you pre-allocation. I assume that would be fact enough?

@kg
Copy link
Author

kg commented Aug 23, 2023

Being able to basically add a constant offset to INITIAL_TABLE would do it for the grow side on my end, I think.

We originally used Module.addFunction, but for thread safety we pivoted to a single big Table.grow operation at startup like you suggest. On my dev machine the call to grow takes 10-15msec. Then the process of doing a set into each of the new slots takes another 10-20ms. Profiling shows all the time is being spent in the implementations of grow and set (plus occasional GCs during the calls). The values I'm setting all come from the module itself. This is much faster in Firefox but still has some overhead there.

(We're reserving around 100k slots because we have a bunch of different function pointer types we need to reserve space for)

@sbc100
Copy link
Collaborator

sbc100 commented Aug 23, 2023

I had another idea. We have a setting called GLOBAL_BASE which is the offset at which we place static data. It defaults to 1024. We could add an additional TABLE_BASE setting (and corresponding linker flag) that would allow offset all table slots.

In both cases that allows the application that then use the first N addresses for their own purpose. Would this work for you? In both cases the region, of course, can't be grown as the compiler will place its own static data at N.

So you would then link with -sTABLE_BASE=100000 and then you could use the first 100k slot as you please?

@kg
Copy link
Author

kg commented Aug 23, 2023

I had another idea. We have a setting called GLOBAL_BASE which is the offset at which we place static data. It defaults to 1024. We could add an additional TABLE_BASE setting (and corresponding linker flag) that would allow offset all table slots.

In both cases that allows the application that then use the first N addresses for their own purpose. Would this work for you? In both cases the region, of course, can't be grown as the compiler will place its own static data at N.

So you would then link with -sTABLE_BASE=100000 and then you could use the first 100k slot as you please?

I think that would work great, and then I don't need to try and predict where the reserved slots are, they're always at 0. That seems like it would have serious negative code size impacts though, because now all function pointer i32.consts are 100k larger?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 23, 2023

Yeah, it might at one or two extra bytes to each of your i32.const instructions.. but do you think you have a lot of those in your code? It depends how often you statically take the address of functions in your code? (note that it wouldn't effect places where the address live in data, e.g. vtables and such, since those are already 4-bytes wide).

@kg
Copy link
Author

kg commented Aug 23, 2023

Yeah, it might at one or two extra bytes to each of your i32.const instructions.. but do you think you have a lot of those in your code? It depends how often you statically take the address of functions in your code? (note that it wouldn't effect places where the address live in data, e.g. vtables and such, since those are already 4-bytes wide).

Yeah if it won't affect vtables (since they live in the data segment, as you point out), it's probably fine. The alternative I was considering was to patch the .wasm file myself and add extra entries to the function table in the module at the end, instead of at the beginning. But if emcc can reserve the space for me at a known location, that's really good, and I can always improve on it later by patching those empty slots with the specific values I want.

@kg
Copy link
Author

kg commented Aug 23, 2023

I just realized I should point out, however, that in some of our test cases our function pointer table already contains 100k+ entries (AOT compilation), so it's actually quite possible that the address of functions is taken a lot... I'm not sure how easily I could verify that or not. Reserving the slots at the end would avoid a negative impact on compiled size, I think?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 23, 2023

Yeah, it might at one or two extra bytes to each of your i32.const instructions.. but do you think you have a lot of those in your code? It depends how often you statically take the address of functions in your code? (note that it wouldn't effect places where the address live in data, e.g. vtables and such, since those are already 4-bytes wide).

Yeah if it won't affect vtables (since they live in the data segment, as you point out), it's probably fine. The alternative I was considering was to patch the .wasm file myself and add extra entries to the function table in the module at the end, instead of at the beginning. But if emcc can reserve the space for me at a known location, that's really good, and I can always improve on it later by patching those empty slots with the specific values I want.

Yes that was going to be my next suggestion.. just to write a custom post-link pass (perhaps a binaryen pass) to do what you want

@sbc100
Copy link
Collaborator

sbc100 commented Aug 23, 2023

I just realized I should point out, however, that in some of our test cases our function pointer table already contains 100k+ entries (AOT compilation), so it's actually quite possible that the address of functions is taken a lot... I'm not sure how easily I could verify that or not. Reserving the slots at the end would avoid a negative impact on compiled size, I think?

In that case though, the majority of your function pointers already require encoding using 3 bytes, right? Only the first 2^14 (16384) can be encoded in 2 bytes.. so you are already paying the price of 3-byte LEB encoding in most cases.

@sbc100
Copy link
Collaborator

sbc100 commented Aug 25, 2023

This is the first part of the change: https://reviews.llvm.org/D158892. Once that lands we can look at adding -sTABLE_BASE to emscripten.

@kg
Copy link
Author

kg commented Aug 25, 2023

Another thing that occurred to me while reading through the spec: If I could access the table.fill instruction that would let me avoid having to pay the cost of invoking WebAssembly.Table.set thousands of times, and it would probably be sufficiently fast. Is there a way to access this from C, by importing some undocumented header maybe? Or an EM_WASM equivalent to EM_ASM?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 25, 2023

You should be able to do that with inline assembly yes. Here is an example, albeit using a separate assembly file:

https://github.com/llvm/llvm-project/blob/c3afa7901d00ca438d64f189cac569938499d7e5/llvm/test/MC/WebAssembly/tables.s#L101-L110

There is also the __builtin_wasm_table_fill builtin: https://github.com/llvm/llvm-project/blob/df112cba034eefb86d0e92e18518f5e944d58c37/clang/test/Sema/builtins-wasm.c#L33-L41

Given that, -sTABLE_BASE still needed?

@kg
Copy link
Author

kg commented Aug 25, 2023

-sTABLE_BASE would still allow pre-reserving the space, which is useful for improved startup time I think. __builtin_wasm_table_fill is the ticket for the rest of it if I can manage to integrate it, I suspect! Will let you know if I manage to get a prototype working.

sbc100 added a commit to llvm/llvm-project that referenced this issue Aug 25, 2023
This is similar to `--global-base` but determines where to place the
table segments rather than that data segments.

See emscripten-core/emscripten#20097

Differential Revision: https://reviews.llvm.org/D158892
@kg
Copy link
Author

kg commented Aug 28, 2023

I'm guessing I would want to do something like:

extern __funcref_t __indirect_function_table[0];

to be able to reach emscripten's indirect function table from my code, right? and then manipulate it like:

// grab an existing function I know exists, using a function pointer I already have
__funcref_t specific_function = __builtin_wasm_table_get_funcref(__indirect_function_table, some_fn_ptr);
// now fill a bunch of table slots with that function
__builtin_wasm_table_fill(__indirect_function_table, idx, specific_function, count);

Am I on the right track? It's very hard to find any documentation for any of this :-) Do I use externref here even though it's a function table?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 28, 2023

I'm not sure anyone has tried to do what you are trying to do here.. so I'm not surprised its not documented. You can try to get it working at the C level, but if it doesn't work you can also try just dropping down the assembly level (at the assembly level I find this kind of thing more easy to grok for some reason).

willcohen added a commit to willcohen/nixpkgs that referenced this issue Oct 10, 2023
willcohen added a commit to willcohen/nixpkgs that referenced this issue Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants