Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lua error crashes host with CFunction but not typed function. #153

Closed
rpatters1 opened this issue Oct 20, 2023 · 22 comments · Fixed by #154
Closed

Lua error crashes host with CFunction but not typed function. #153

rpatters1 opened this issue Oct 20, 2023 · 22 comments · Fixed by #154
Labels
enhancement New feature or request

Comments

@rpatters1
Copy link
Contributor

rpatters1 commented Oct 20, 2023

I am opening this issue to track a behavior I've encountered. It's rather complicated, so I have not yet tried to create a test case, but I thought opening it might trigger an idea from @kunitoki or anyone else. I have a dialog window class in C++ that calls back into Lua on control events. In order to capture errors in the Lua code, the callback is protected by try/catch.

Here is the scenario:

  • Lua script creates C++ window class and registers a callback function for a control. (C++ sees it as a LuaRef that is a function.)
  • The user activates the control, which triggers an event in C++ code.
  • The C++ code passes the event off to the registered handler by calling the LuaRef function. This call is protected by try/catch, which catches any LuaException.
  • The callback Lua code then calls another C++ function with incorrect parameters.
  • If this new C++ function is a regular typed C++ function with parameters listed, the catch block is activated and the error is handled gracefully.
  • However, if the new C++ function is a CFunction, the exception occurs when trying to access the bad parameter using Stack<>::get().value(). This exception is not caught, and the system crashes.

I'll work on a test case at some point if I can ever think of a way to crystalize it. I still haven't ruled out that it's a bug in my exception handling, but since the typed C++ functions work, it is difficult to know what it might be.

@kunitoki
Copy link
Owner

Sounds there is something missing when invoking a CFunction, will try to setup a repro case

@kunitoki
Copy link
Owner

kunitoki commented Oct 20, 2023

Our expected (returned from Stack::get) will raise BadExpectedAccess exception (much like std::expected see the Exceptions section of value() https://en.cppreference.com/w/cpp/utility/expected/value), not a LuaException, so you maybe need to catch it as well (or catch std::runtime_error, which i would suggest as when running lua and you end up with a c++ exception, it will pass uncatched).

@kunitoki
Copy link
Owner

kunitoki commented Oct 20, 2023

If you have something like this:

vois exposedCPPFunction(int, std::string);

int exposedCFunction(lua_State* L)
{
    auto x = Stack::get<std::string>(L, -2); // This should fail as stack -2 is a int
    x.value(); // This throws as it is a non checked access to expected
}

runLua(R"(
  exposedCPPFunction(1, 2) -- Throws a LuaException
  exposedCFunction(1, 2) -- Throws a BadAccessException
)");

Try something like this instead:

vois exposedCPPFunction(int, std::string);

int exposedCFunction(lua_State* L)
{
    auto x = Stack::get<std::string>(L, -2);

    if (! x)
        throw luabridge::LuaException(L, x.error());

    x.value(); // Now access to value is checked before
}

runLua(R"(
  exposedCPPFunction(1, 2) -- Throws a LuaException
  exposedCFunction(1, 2) -- Throws a LuaException
)");

@rpatters1
Copy link
Contributor Author

rpatters1 commented Oct 20, 2023

I tried rewriting the way you suggested, and it made a slight difference. Explicitly throwing the exception caused the XCode debugger to break on the breakpoint in my catch block. Allowing value() to throw meant it did not break on the breakpoint in my catch block However, either it way it crashes in the same way, so that makes me think the debugger is not breaking (which is not uncommon with try/catch) rather than that there is any fundamental difference in the two approaches.

I am using a CFunction because I need to have an optional parameter. However, the current Lua script I am writing always calls the function with the optional parameter. So I rewrote the C++ function signature like this:

MyReturnType myFunc(const char* filePath, const char* scriptText, lua_State* l)
{

When I run my Lua code with the error, this function hits my catch block and correctly reports the error.

By comparison, my original function gets its parameters like this:

int myFunc(lua_State *l)
{
   const int numArgs = lua_gettop(l);
   const char* filePath = luabridge::Stack<const char*>::get(l, 1) LB3(.value());
   const char* scriptText = (numArgs < 2) ? nullptr : luabridge::Stack<const char*>::get(l, 2) LB3(.value());

When the function call occurs with the bad 2nd parameter, the value() function throws internally, the debugger does not break at my catch block, and the program crashes.

I changed the function to get the 2nd parameter as you suggested:

int myFunc(lua_State *l)
{
   const int numArgs = lua_gettop(l);
   const char* filePath = luabridge::Stack<const char*>::get(l, 1) LB3(.value());
   const char* scriptText = nullptr;
   if (numArgs >= 2)
   {
      auto val = luabridge::Stack<const char*>::get(l, 2);
      if (!val)
         throw luabridge::LuaException(l, val.error());
      scriptText = val.value();
   }

This function does break in my catch block, but it still crashes the program in a seemingly identical manner as the previous case.

@rpatters1
Copy link
Contributor Author

rpatters1 commented Oct 20, 2023

So this problem is definitely in my code, at least to the extent that I have tracked down the difference in behavior. My cleanup routine checks to see if Lua is running and rethrows if it is, in order to let Lua/LuaBridge error handling handle it. The way it checks is this:

   lua_Debug ar;
   const bool isLuaRunning = lua_getstack(e.state(), 0, &ar);

When the exception is thrown in the case of the typed function, isLuaRunning is false. When the exception is thrown in the CFunction, isLuaRunning is true. The problem is, Lua is not running in this case. We are back at the level of the C++ event handling, and there is no Lua on the stack below us to catch this exception.

I would welcome suggestions as to how to handle this.

@kunitoki
Copy link
Owner

There are so many moving parts here that it's a bit difficult to understand where the problem could be. I can try to replicate a similar scenario but i need more information on how you register the methods, execute the lua script and catch the exceptions.

@rpatters1
Copy link
Contributor Author

I'll start working on that. In the meantime, is it your understanding that

   lua_Debug ar;
   const bool isLuaRunning = lua_getstack(e.state(), 0, &ar);

should tell me if there is an instance of Lua on the stack? Perhaps this is not a reliable technique.

@rpatters1
Copy link
Contributor Author

I guess what I've come down to is, it seems like LB3 is managing the stack differently with calls to CFunctions than for calls to typed functions. That seems to be the cause of the difference in behavior.

@rpatters1
Copy link
Contributor Author

rpatters1 commented Oct 20, 2023

Okay, here is a test case that reproduces the problem.

static const char* TypedFunction(const char* val)
{ return val; }
static int CFunction(lua_State* L)
{
   const int numArgs = lua_gettop(L);
   // this next line is where the exception throws:
   const char* value = (numArgs < 1) ? nullptr : luabridge::Stack<const char*>::get(L, 2).value();
   luabridge::Stack<const char*>::push(L, value).throw_on_error();
   return 1;
}
static std::unique_ptr<luabridge::LuaRef> luacallback;
static void RegisterCallback(luabridge::LuaRef callback)
{
   luacallback.reset(new luabridge::LuaRef(callback));
}

      luabridge::getGlobalNamespace(L)
         .addFunction("TypedFunction", TypedFunction)
         .addFunction("CFunction", CFunction)
         .addFunction("RegisterCallback", RegisterCallback);

and Lua code:

function callback()
    local bad_param = {}
    --TypedFunction(bad_param)
    CFunction(bad_param)
end
RegisterCallback(callback)
finenv.RetainLuaState = true

Note that RetainLuaState = true causes my Lua connector to exit and return control to the host program without closing the Lua state. The next time the user invokes the script, it has all the global variables intact exactly as it left them when it returned back to the connector. So, somewhere in the C++ after the Lua script has returned control to C++, I have added the following c++ code. (In this case it is close to immediately following the return from luaL_dofile.)

   if (luacallback && luacallback->isFunction())
   {
      try
      {
         luacallback->operator()();
      }
      catch (luabridge::LuaException& e)
      {
         std::string msg = e.what();
         lua_Debug ar;
         const bool isLuaRunning = lua_getstack(e.state(), 0, &ar);
         if (isLuaRunning)
            throw e; // should not get here.
         // this is sample error recovery. the real code is different
         SetErrorMessage(msg.c_str());
         status = -1; // causes the state to terminate and the error to be reported
      }
   }

With the Lua code as written, the catch block does not catch the exception and the system crashes. If I instead call TypedFunction in the lua code, the catch block activates, and the error recovery happens gracefully. If I modify CFunction as you suggested above:

static int CFunction(lua_State* L)
{
   const int numArgs = lua_gettop(L);
   //const char* value = (numArgs < 1) ? nullptr : luabridge::Stack<const char*>::get(L, 2).value();
   const char* value = nullptr;
   if (numArgs >= 1)
   {
      auto x = luabridge::Stack<const char*>::get(L, 2);
      if (!x) throw luabridge::LuaException(L, x.error());
      value = x.value();
   }
   luabridge::Stack<const char*>::push(L, value).throw_on_error();
   return 1;
}

In this case my catch block activates, but (weirdly) the msg from e.what() is an empty string. And the test for lua_getstack returns true, causing the code to rethrow with nothing there to catch it.

@kunitoki
Copy link
Owner

kunitoki commented Oct 21, 2023

Some side questions: are you sure you are clearing the luaCallback holded LuaRef before tering down the lua state ?

I have reproduced this example:

namespace {
static const char* TypedFunction(const char* val)
{
    return val;
}

static int CFunction(lua_State* L)
{
    const int numArgs = lua_gettop(L);

    // this next line is where the exception throws:
    const char* value = (numArgs < 1) ? nullptr : luabridge::Stack<const char*>::get(L, 1).value();
    
    luabridge::Stack<const char*>::push(L, value).throw_on_error();
    
    return 1;
}

static std::unique_ptr<luabridge::LuaRef> luaCallback;

static void RegisterCallback(luabridge::LuaRef callback)
{
   luaCallback.reset(new luabridge::LuaRef(callback));
}
} // namespace

TEST_F(LuaBridgeTest, Bug153)
{
    luabridge::getGlobalNamespace(L)
        .addFunction("TypedFunction", TypedFunction)
        .addFunction("CFunction", CFunction)
        .addFunction("RegisterCallback", RegisterCallback);
    
    runLua(R"(
        function callback()
            local bad_param = {}

            --TypedFunction(bad_param)
            CFunction(bad_param)
        end

        RegisterCallback(callback)

        -- finenv.RetainLuaState = true
    )");
    
    if (luaCallback && luaCallback->isFunction())
    {
        try
        {
            (*luaCallback)();
        }
        catch (const std::exception& e)
        {
            std::string msg = e.what();

            lua_Debug ar;
            const bool isLuaRunning = lua_getstack(e.state(), 0, &ar);
            if (isLuaRunning)
                throw e; // should not get here.

            // this is sample error recovery. the real code is different
        }
        
        luaCallback.reset();
    }
}

It doesn't crash with neither of the two methods called (C++ or CFunction), and i get the message out correctly. Also isLuaRunning is false in both methods. Are you sure you aren't reentrant twice from Lua ? Like C++ > Lua > C++ > Lua > C++ (here you do exit from the last Lua scope but you are still inside Lua)

@kunitoki
Copy link
Owner

You can also try catching std::exception instead

@kunitoki
Copy link
Owner

kunitoki commented Oct 21, 2023

To clarify the difference between a C++ call and CFunction is that calling C++ will perform argument conversion in luabridge:

struct function
{
    template <class F>
    static int call(lua_State* L, F func)
    {
        Result result;

        try
        {
            result = Stack<ReturnType>::push(L, std::apply(func, make_arguments_list<ArgsPack, Start>(L)));
        }
        catch (const std::exception& e)
        {
            raise_lua_error(L, "%s", e.what());
        }

        if (! result)
            raise_lua_error(L, "%s", result.message().c_str());

        return 1;
    }

where we catch the exceptions raised and rethrow as LuaException.

But CFunction is invoked as is from lua, there is no involvement of luabridge for doing parameter conversion casts, that way you should be the one converting any exception raised there and rethrow as a luabridge::LuaException or just using luaL_error (which will call the panic handler and throw a LuaException anyway).

@rpatters1
Copy link
Contributor Author

I tried both your suggestions (std::exception and resetting the ptr) and they made no difference. As I understand it, your system is not crashing with this test example. For this testcase I am definitely only one level deep: C++->Lua->C++.

If I had to guess the most likely factor that make the difference, it is how Lua is compiled. My system compiles Lua in C and therefore does not use C++ exceptions inside Lua itself. However, one of the nice things about LuaBridge is its ability to bridge between C exception handling and C++ exception handling. I'm guessing the issue is here.

@kunitoki
Copy link
Owner

kunitoki commented Oct 21, 2023

That is why, when executing your cfunction, lua will longjmp because it is compiled as C. You won't be able to mix longjmp and exceptions. You need to compile lua as c++, or avoid leaking exceptions out of lua CFunctions.

@kunitoki
Copy link
Owner

To be sure it works in your case, you have to not let exceptions "leak" out of your CFunction, and instead convert them to a normal lua_error:

static int CFunction(lua_State* L)
{
    const int numArgs = lua_gettop(L);

    try
    {
        const char* value = (numArgs < 1) ? nullptr : luabridge::Stack<const char*>::get(L, 1).value();
    
        luabridge::Stack<const char*>::push(L, value).throw_on_error();
    }
    catch (const std::exception& e)
    {
        luaL_error(L, e.what());
    }
    
    return 1;
}

@kunitoki
Copy link
Owner

I've open a PR that creates an indirection to catch the exceptions in CFunctions and trigger a lua_error instead of leaking the exception (this way a C compiled lua can correctly handle the longjmp).

#154

It has a drawback that i still need to workaround, when lua is compiled as C++, every CFunction is called with an additional CFunction indirection, which is not needed. I will probably allow to define a macro that enables the safe call.

@rpatters1
Copy link
Contributor Author

rpatters1 commented Oct 21, 2023

Thank you for that. I once tried compiling my Lua instance with C++ and it caused a lot of problems, though I don't remember them now. I have tried compiling your PR, but a change that was introduced earlier has destroyed my ability to compile LB3. (!) That is the deleted copy constructors.

Lua itself knows which way it was compiled, but there doesn't seem any way to get that information out of it. A compile-flag in LuaBridge seems like as good an option as any.

(I'm opening a new issue for the compilation problem.)

@rpatters1
Copy link
Contributor Author

I was able to compile by cherry-picking the changes for this PR into the version of LB3 that works for me. It definitely solves the crash.

@kunitoki
Copy link
Owner

kunitoki commented Oct 22, 2023

I've updated #154 where i've added a global define LUABRIDGE_SAFE_LUA_C_EXCEPTION_HANDLING by default set to 0.

If you want the implicit extra safety (at a slight performance cost when executing a luabridge registered lua_CFunction) you can just set this globally to 1. Alternatively you can just make sure no exceptions "leak" outside of your lua_CFunctions and can live with it being 0.

@rpatters1
Copy link
Contributor Author

Will I need to set it to get the protection for Lua compiled in C?

@rpatters1
Copy link
Contributor Author

Never mind, I figured it out.

@kunitoki
Copy link
Owner

Yes set it globally (in cmake or whatever), so it's picked up by LuaBridge3. Alternatively define it before including LuaBridge3 (but beware of not including LuaBridge3 header without the define).

@kunitoki kunitoki linked a pull request Oct 22, 2023 that will close this issue
@kunitoki kunitoki added enhancement New feature or request ready and removed ready labels Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants