Skip to content

Commit

Permalink
Lua: switch to recursive mutex in lua_lock()
Browse files Browse the repository at this point in the history
Previously, calling a Lua C API function that throws an error (e.g. with any of
the `lua_error()` family of functions, which use `luaD_throw()` internally) in a
function that also uses our own `StackUnwinder` would deadlock, because:

- `luaD_throw()` calls `lua_lock()` but apparently expects something else to
  call `lua_unlock()`
- `~StackUnwinder()` calls `lua_settop()` to rewind the stack, which internally
  also calls `lua_lock()` and `lua_unlock()`. This is called before the mutex
  can be unlocked.

This was the root cause of the issue behind DFHack#5040, DFHack#5055, DFHack#5056 - `GetVector()`
was called with an invalid index (only when invoked from `gui/launcher`, because
this changed the data on the Lua stack) and threw an exception, which caused the
`~StackUnwinder()` destructor to run while the Lua state was still locked.

Other things to note:
- We control the locking/unlocking implementation - the default, defined in
  `llimits.h`, is a _no-op_.
- @ab9rf points out that the `CRITICAL_SECTION` API we're using on Windows
  already appears to be equivalent to a recursive mutex, so this issue likely
  would not have occurred there.

The issue can be reproduced relatively easily with a simple test command, e.g.:

```diff
@@ -716,6 +723,15 @@ void ls_helper(color_ostream &con, const std::vector<std::string> &params) {
 command_result Core::runCommand(color_ostream &con, const std::string &first_, std::vector<std::string> &parts)
 {
     std::string first = first_;
+    if (first == "luaerror") {
+        auto L = Lua::Core::State;
+        Lua::StackUnwinder top(L);
+
+        luaL_error(L, "test error");
+
+        return CR_OK;
+    }
+
     CommandDepthCounter counter;
     if (!counter.ok())
     {
```

In `gui/launcher`:
- before this change, invoking `luaerror` would hang DF.
- after this change, `luaerror` prints `nil` in red to the native console and
  does nothing. This comes from the last-resort error handler in
  `LuaTools.cpp:report_error()`, and is equivalent to how other unexpected
  errors are handled in code invoked from Lua. Not ideal, but better than a
  crash.

From the native console, invoking `luaerror` crashes DF in both cases (SIGABRT),
and logs `PANIC: unprotected error in call to Lua API (test error)` to
`stderr.log`. The difference in behavior is because `report_error()` above is
part of the error handling system present when code is called _from Lua_,
through variants of `SafeCall`. There is no Lua layer involved in the native
console. (Note: the same crash would have been observed in the original issue
in DFHack#5056 et. al. if the error had occurred when invoked through the console.)
  • Loading branch information
lethosor committed Nov 30, 2024
1 parent b644258 commit 7c1713f
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions depends/lua/include/dfhack_llimits.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,17 @@ struct lua_extra_state {
#define lua_lock(L) EnterCriticalSection(luai_mutex(L))
#define lua_unlock(L) LeaveCriticalSection(luai_mutex(L))
#else
#define luai_userstateopen(L) luai_mutex(L) = (mutex_t*)malloc(sizeof(mutex_t)); *luai_mutex(L) = PTHREAD_MUTEX_INITIALIZER
#define luai_userstateclose(L) lua_unlock(L); pthread_mutex_destroy(luai_mutex(L)); free(luai_mutex(L))
#define luai_userstateopen(L) do { \
luai_mutex(L) = (mutex_t*)malloc(sizeof(mutex_t)); \
pthread_mutexattr_t attr; \
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); \
pthread_mutex_init(luai_mutex(L), &attr); \
} while (0)
#define luai_userstateclose(L) do { \
lua_unlock(L); \
pthread_mutex_destroy(luai_mutex(L)); \
free(luai_mutex(L)); \
} while (0)
#define lua_lock(L) pthread_mutex_lock(luai_mutex(L))
#define lua_unlock(L) pthread_mutex_unlock(luai_mutex(L))
#endif

0 comments on commit 7c1713f

Please sign in to comment.