Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A host function that calls another host function loses access to the memory #1626

Closed
jerbob92 opened this issue Aug 12, 2023 · 3 comments · Fixed by #1636
Closed

A host function that calls another host function loses access to the memory #1626

jerbob92 opened this issue Aug 12, 2023 · 3 comments · Fixed by #1636
Labels
bug Something isn't working

Comments

@jerbob92
Copy link
Contributor

Describe the bug
So I'm not entirely sure this is a bug report or a feature request. I came across this issue while implementing C++ exceptions support for Emscripten in Wazero. Apparently it's possible that Emscripten uses the generated invoke_xxx methods (which is a host function) to call another host function, without a "proxy" through the guest. I then found out that it's also possible to reproduce it without using C++ exceptions.

This issue causes the second host function not to have access to the memory, because the second host function is being called by invoke_xxx, which is in the env module, and receives the env api.Module, while the invoke_xxx is also in the env module, it does receive the main application memory, and this is also mentioned in the comments of GoModuleFunction:

// This function can be non-deterministic or cause side effects. It also
// has special properties not defined in the WebAssembly Core specification.
// Notably, this uses the caller's memory (via Module.Memory). See
// https://www.w3.org/TR/wasm-core-1/#host-functions%E2%91%A0

While it does make sense to give this the caller's memory (if we didn't, the host functions would be way less usable), but in my case, this means that the second host function call does not have access to the memory, just because it was called from another host method and not from the guest, while it would have access to it if it would have been called by a "proxy" function in the guest.

I'm not sure what the best solution to this would be. Perhaps it would be possible to have an option to override the moduleInstance or memoryInstance in m.Engine.LookupFunction or f.CallWithStack so that we can use it in InvokeFunc.Call() to maintain access to the memory when doing host to host calls.

Given the sample code in the next block, the following two things will happen:

  • invoke_i (host)
  • you_could_say_im_a_host_function_proxy (guest)
  • you_could_say_im_a_host_function (host)
  • m.Memory().ReadByte(200)

Works 🎉

  • invoke_i (host)
  • you_could_say_im_a_host_function (host)
  • m.Memory().ReadByte(200)

Breaks 💣 (failed: runtime error: invalid memory address or nil pointer dereference)

To Reproduce

C code compiled with Emscripten
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>

extern int you_could_say_im_a_host_function();

void jmpfunction(jmp_buf env_buf) {
   longjmp(env_buf, 42);
}

int you_could_say_im_a_host_function_proxy() {
  return you_could_say_im_a_host_function();
}

int main () {
   int val;
   jmp_buf env_buffer;

   val = setjmp( env_buffer );

   if( val != 0 ) {
      printf("Returned from a longjmp() with value = %d\n", val);
      you_could_say_im_a_host_function_proxy();
      you_could_say_im_a_host_function();
      exit(0);
   }

   printf("Jump function call\n");
   jmpfunction(env_buffer);

   return(0);
}

emcc -sERROR_ON_UNDEFINED_SYMBOLS=0 -sSUPPORT_LONGJMP=emscripten -g invoke_host.c -o invoke_host.wasm

Go code with the host function implemented
package main

import (
	"context"
	_ "embed"
	"github.com/tetratelabs/wazero"
	"github.com/tetratelabs/wazero/api"
	"github.com/tetratelabs/wazero/experimental"
	"github.com/tetratelabs/wazero/experimental/logging"
	"github.com/tetratelabs/wazero/imports/emscripten"
	"github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"
	"log"
	"os"
)

//go:embed invoke_host.wasm
var wasm []byte

func main() {
	ctx := context.WithValue(context.Background(), experimental.FunctionListenerFactoryKey{}, logging.NewLoggingListenerFactory(os.Stdout))
	r := wazero.NewRuntime(ctx)

	defer r.Close(ctx)

	if _, err := wasi_snapshot_preview1.Instantiate(ctx, r); err != nil {
		log.Fatal(err)
	}

	compiledModule, err := r.CompileModule(ctx, wasm)
	if err != nil {
		log.Fatal(err)
	}

	functionExporter, err := emscripten.NewFunctionExporterForModule(compiledModule)
	if err != nil {
		log.Fatal(err)
	}

	envModule := r.NewHostModuleBuilder("env")
	functionExporter.ExportFunctions(envModule)
	envModule.NewFunctionBuilder().WithGoModuleFunction(api.GoModuleFunc(func(ctx context.Context, m api.Module, stack []uint64) {
		m.Memory().ReadByte(200)
		stack[0] = 42
	}), []api.ValueType{}, []api.ValueType{api.ValueTypeI32}).Export("you_could_say_im_a_host_function")
	_, err = envModule.Instantiate(ctx)
	if err != nil {
		log.Fatal(err)
	}

	moduleConfig := wazero.NewModuleConfig().
		WithStartFunctions("_start").
		WithStdout(os.Stdout).
		WithStderr(os.Stderr)

	_, err = r.InstantiateModule(ctx, compiledModule, moduleConfig)
	if err != nil {
		log.Fatal(err)
	}
}

Full sample zip:
invoke_host.zip

Environment (please complete the relevant information):

  • Go version: 1.20.2
  • wazero Version: v1.4.0
  • Host architecture: amd64
  • Runtime mode: both

Additional context

This was also discussed a bit on Slack: https://gophers.slack.com/archives/C040AKTNTE0/p1691699092428169

@jerbob92 jerbob92 added the bug Something isn't working label Aug 12, 2023
@ncruces
Copy link
Collaborator

ncruces commented Aug 16, 2023

I've looked a bit into this. I built invoke_host.wasm.zip according to instructions.

With the compiler I can't get into the second call into you_could_say_im_a_host_function, it fails before that. With the interpreter I can.

To fix that I needed to add the following to the top of execWasmFunction:

func (ce *callEngine) execWasmFunction(ctx context.Context, m *wasm.ModuleInstance) {
	codeAddr := ce.initialFn.codeInitialAddress
	modAddr := ce.initialFn.moduleInstance

	if ce.initialFn.parent.goFunc != nil {
		calleeHostFunction := ce.moduleContext.fn
		base := int(ce.stackBasePointerInBytes >> 3)

		// In the compiler engine, ce.stack has enough capacity for the
		// max of param or result length, so we don't need to grow when
		// there are more results than parameters.
		stackLen := calleeHostFunction.funcType.ParamNumInUint64
		if resultLen := calleeHostFunction.funcType.ResultNumInUint64; resultLen > stackLen {
			stackLen = resultLen
		}
		stack := ce.stack[base : base+stackLen]

		fn := calleeHostFunction.parent.goFunc
		switch fn := fn.(type) {
		case api.GoModuleFunction:
			fn.Call(ctx, modAddr, stack)
		case api.GoFunction:
			fn.Call(ctx, stack)
		}
		return
	}

entry:

Then I can repro with both compiler or interpreter. I couldn't get the appropriate modAddr to GoModuleFunction on either (compiler or interpreter). For regular host calls we get that from ce.callerModuleInstance (compiler) which is setup by native code. But we can't get the correct module across CallWithStack in here (nor do I think we should). The correct module is available there (it's the mod argument to that function), though.

This is as far as I got. Can't go much further down, doesn't look like a simple fix.

@jerbob92
Copy link
Contributor Author

jerbob92 commented Aug 17, 2023

If the fix isn't so simple, I have thought of this workaround for now:
3b29107

It at least fixes the issue I had while implementing Emscripten C++ Exceptions because both host functions are within the same package. It won't fix the issue reported in the example because I won't be able to get the parent callee module from the context, but I don't think it will be a common issue, I had to make some specific code to make the invoke_xxx call another host method, while in C++ exception handling it's a common occurrence.

Here is the full implementation:
main...jerbob92:wazero:feature/implement-emscripten-cpp-exceptions

@mathetake
Copy link
Member

verified that #1636 fixes the repro:

$ go run .
Jump function call
Returned from a longjmp() with value = 42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants