Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "REGRESSION: Fix console command completion help" #5055

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

myk002
Copy link
Member

@myk002 myk002 commented Nov 29, 2024

@myk002 myk002 merged commit 33acc1f into develop Nov 29, 2024
25 of 26 checks passed
@myk002 myk002 deleted the revert-5040-fix_console_autocomplete branch November 29, 2024 21:05
lethosor added a commit to lethosor/dfhack that referenced this pull request Nov 30, 2024
Previously, calling a Lua C API function that throws an error (e.g. with any of
the `lua_error()` family of functions, which use `luaD_throw()` internally) in a
function that also uses our own `StackUnwinder` would deadlock, because:

- `luaD_throw()` calls `lua_lock()` but apparently expects something else to
  call `lua_unlock()`
- `~StackUnwinder()` calls `lua_settop()` to rewind the stack, which internally
  also calls `lua_lock()` and `lua_unlock()`. This is called before the mutex
  can be unlocked.

This was the root cause of the issue behind DFHack#5040, DFHack#5055, DFHack#5056 - `GetVector()`
was called with an invalid index (only when invoked from `gui/launcher`, because
this changed the data on the Lua stack) and threw an exception, which caused the
`~StackUnwinder()` destructor to run while the Lua state was still locked.

Other things to note:
- We control the locking/unlocking implementation - the default, defined in
  `llimits.h`, is a _no-op_.
- @ab9rf points out that the `CRITICAL_SECTION` API we're using on Windows
  already appears to be equivalent to a recursive mutex, so this issue likely
  would not have occurred there.

The issue can be reproduced relatively easily with a simple test command, e.g.:

```diff
@@ -716,6 +723,15 @@ void ls_helper(color_ostream &con, const std::vector<std::string> &params) {
 command_result Core::runCommand(color_ostream &con, const std::string &first_, std::vector<std::string> &parts)
 {
     std::string first = first_;
+    if (first == "luaerror") {
+        auto L = Lua::Core::State;
+        Lua::StackUnwinder top(L);
+
+        luaL_error(L, "test error");
+
+        return CR_OK;
+    }
+
     CommandDepthCounter counter;
     if (!counter.ok())
     {
```

In `gui/launcher`:
- before this change, invoking `luaerror` would hang DF.
- after this change, `luaerror` prints `nil` in red to the native console and
  does nothing. This comes from the last-resort error handler in
  `LuaTools.cpp:report_error()`, and is equivalent to how other unexpected
  errors are handled in code invoked from Lua. Not ideal, but better than a
  crash.

From the native console, invoking `luaerror` crashes DF in both cases (SIGABRT),
and logs `PANIC: unprotected error in call to Lua API (test error)` to
`stderr.log`. The difference in behavior is because `report_error()` above is
part of the error handling system present when code is called _from Lua_,
through variants of `SafeCall`. There is no Lua layer involved in the native
console. (Note: the same crash would have been observed in the original issue
in DFHack#5056 et. al. if the error had occurred when invoked through the console.)
myk002 pushed a commit that referenced this pull request Dec 1, 2024
Previously, calling a Lua C API function that throws an error (e.g. with any of
the `lua_error()` family of functions, which use `luaD_throw()` internally) in a
function that also uses our own `StackUnwinder` would deadlock, because:

- `luaD_throw()` calls `lua_lock()` but apparently expects something else to
  call `lua_unlock()`
- `~StackUnwinder()` calls `lua_settop()` to rewind the stack, which internally
  also calls `lua_lock()` and `lua_unlock()`. This is called before the mutex
  can be unlocked.

This was the root cause of the issue behind #5040, #5055, #5056 - `GetVector()`
was called with an invalid index (only when invoked from `gui/launcher`, because
this changed the data on the Lua stack) and threw an exception, which caused the
`~StackUnwinder()` destructor to run while the Lua state was still locked.

Other things to note:
- We control the locking/unlocking implementation - the default, defined in
  `llimits.h`, is a _no-op_.
- @ab9rf points out that the `CRITICAL_SECTION` API we're using on Windows
  already appears to be equivalent to a recursive mutex, so this issue likely
  would not have occurred there.

The issue can be reproduced relatively easily with a simple test command, e.g.:

```diff
@@ -716,6 +723,15 @@ void ls_helper(color_ostream &con, const std::vector<std::string> &params) {
 command_result Core::runCommand(color_ostream &con, const std::string &first_, std::vector<std::string> &parts)
 {
     std::string first = first_;
+    if (first == "luaerror") {
+        auto L = Lua::Core::State;
+        Lua::StackUnwinder top(L);
+
+        luaL_error(L, "test error");
+
+        return CR_OK;
+    }
+
     CommandDepthCounter counter;
     if (!counter.ok())
     {
```

In `gui/launcher`:
- before this change, invoking `luaerror` would hang DF.
- after this change, `luaerror` prints `nil` in red to the native console and
  does nothing. This comes from the last-resort error handler in
  `LuaTools.cpp:report_error()`, and is equivalent to how other unexpected
  errors are handled in code invoked from Lua. Not ideal, but better than a
  crash.

From the native console, invoking `luaerror` crashes DF in both cases (SIGABRT),
and logs `PANIC: unprotected error in call to Lua API (test error)` to
`stderr.log`. The difference in behavior is because `report_error()` above is
part of the error handling system present when code is called _from Lua_,
through variants of `SafeCall`. There is no Lua layer involved in the native
console. (Note: the same crash would have been observed in the original issue
in #5056 et. al. if the error had occurred when invoked through the console.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant