-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: crash since Go 1.22.5 on Alpine (newstack at runtime.printlock) #68285
Comments
Do you have any local patches to the Go standard library? I know that some Alpine systems use local patches. |
You are hitting the fatal error at https://cs.opensource.google/go/go/+/refs/tags/go1.22.5:src/runtime/cgocall.go;l=241 due to an apparently bogus SP.
From the mention of FrankenPHP, I assume this is just a continuation of #62130? In #62130 (comment) and @dunglas' reply, we established that FrankenPHP changes stacks between a Go-to-C call and subsequent C-to-Go callback on the same thread. Go does not support this, and we never made a concrete decision on whether we should support this or not. Seems like this should be closed as a duplicate of #62130? |
@ianlancetaylor we're using the official Docker images for Go (Alpine variants), I assume the standard library is not patched. @prattmic this doesn't seem to be the same problem as #62130. #62130 is triggered only in a special case that is not very popular (Fibers, a new PHP feature that changes stack). Since yesterday, all FrankenPHP installations (including ones not using Fibers) on Alpine are crashing, it wasn't the case before Go 1.22.5, so I suppose this is a problem introduced in this version. |
We have an easy way to reproduce and debug, feel free to contact me in private (kevin@dunglas.dev, X, etc) if you want assistance to setup a reproducer. |
Thanks for the clarification, let's keep this separate then. |
cc @golang/runtime @cherrymui |
Not sure if this is actionable / supported, but assigning for the moment to Michael. |
Here is the stack trace with http://go.dev/cl/585817, as suggested by @prattmic:
To be honest, I've no idea what's going on. It may be a bug in our code, but it's surprising that the problem only occurs when using musl. I tested and can confirm that the problem only happens when using musl (both with Alpine and with a static build) and not with glibc. |
I created a custom Go build that reverts 3560cf0, and this makes our test suite green. So I can confirm that the problem has been introduced by this commit: https://github.com/dunglas/frankenphp/actions/runs/9877453302/job/27279127840?pr=913 Would it be possible to revert it while a better fix isn't available? I'm not confident relying on a custom build, and the latest patch version contains security fixes in |
@dunglas thanks for the update! The reason why this is musl specific is probably that the code getting the stack bounds https://source.corp.google.com/h/go/go/+/release-branch.go1.22:src/runtime/cgo/gcc_stack_unix.c;l=24 uses different code paths for glibc and musl. With glibc, we get accurate stack bounds from pthread. With musl, it goes to the fallback path (the It seems musl also supports But there is still an issue on systems that we couldn't get the accurate bounds. It seems this is a C created thread, which calls into Go, which then calls back into C, which calls back to Go again, which throws. Presumably during the first C->Go call it gets the estimated stack bounds. At the inner C-> Go call, the SP is slightly above the estimated upper bound, which is weird. I don't see how this could happen. The inner callback should be at a lower stack address than the outer callback... @dunglas could you confirm if the code does C->Go->C->Go calls? And does it adjust the stacks in anyway, or using non-local control flow like Perhaps in the case that we can't get the accurate bounds we don't throw if the SP is out of the estimated bounds? But that can also be tricky as if we update the bounds in the inner callback, when it returns, we may have weird stack bounds for the outer callback... |
Maybe what happens is that
The callback at step 4 is at a shallower stack (higher SP), but it sees the stack bounds from step 2, which is at a lower address, then throws. @dunglas could you confirm that the code does C->Go calls like that? Thanks. |
@cherrymui thank you very much for investigating! Yes: FrankenPHP creates a C thread (in C: https://github.com/dunglas/frankenphp/blob/4fab5a3169357f6c6cadee1aa77ff8fa8480419f/frankenphp.c#L810), and the C code calls back to Go (and the Go code then calls the C code, etc). The C code uses Let me know if you want me to try some patches (I can try to use |
Using Regarding the detection problem, a solution could be to check if the function is available (using a shell script similar to I can work on a patch doing this if you agree on the approach. |
Here are my experiments so far: diff --git a/src/make.bash b/src/make.bash
index 76ad51624a..c6c3b87157 100755
--- a/src/make.bash
+++ b/src/make.bash
@@ -138,6 +138,13 @@ if [[ "$(uname -s)" == "GNU/kFreeBSD" ]]; then
export CGO_ENABLED=0
fi
+# Check pthread_getattr_np() availability
+if [[ "${CGO_ENABLED}" != "0" ]] && ${CC:-gcc} -S runtime/cgo/testdata/detect_pthread_getattr_np.c -o /dev/null 2> /dev/null; then
+ export CGO_CFLAGS="-DHAVE_PTHREAD_GETATTR_NP ${CGO_CFLAGS}"
+ export CFLAGS="-DHAVE_PTHREAD_GETATTR_NP ${CFLAGS}"
+ echo $CGO_CFLAGS
+fi
+
# Clean old generated file that will cause problems in the build.
rm -f ./runtime/runtime_defs.go
diff --git a/src/runtime/cgo/gcc_stack_unix.c b/src/runtime/cgo/gcc_stack_unix.c
index 884281dc15..39d3402eca 100644
--- a/src/runtime/cgo/gcc_stack_unix.c
+++ b/src/runtime/cgo/gcc_stack_unix.c
@@ -21,7 +21,7 @@ x_cgo_getstackbound(uintptr bounds[2])
// Needed before pthread_getattr_np, too, since before glibc 2.32
// it did not call pthread_attr_init in all cases (see #65625).
pthread_attr_init(&attr);
-#if defined(__GLIBC__) || (defined(__sun) && !defined(__illumos__))
+#if defined(HAVE_PTHREAD_GETATTR_NP) || defined(__GLIBC__) || (defined(__sun) && !defined(__illumos__))
// pthread_getattr_np is a GNU extension supported in glibc.
// Solaris is not glibc but does support pthread_getattr_np
// (and the fallback doesn't work...). Illumos does not.
diff --git a/src/runtime/cgo/testdata/detect_pthread_getattr_np.c b/src/runtime/cgo/testdata/detect_pthread_getattr_np.c
new file mode 100644
index 0000000000..f15fcd0e25
--- /dev/null
+++ b/src/runtime/cgo/testdata/detect_pthread_getattr_np.c
@@ -0,0 +1,10 @@
+/* Dummy program used to check if pthread_getattr_np() is available. */
+
+#define _GNU_SOURCE
+#include <pthread.h>
+
+int main(void) {
+ pthread_getattr_np(pthread_self(), NULL);
+
+ return 0;
+} The detection code of |
Thanks for the update! I don't think we want to introduce any autoconf-like things into Either way, I think we still want to make the fallback path (if
|
PHP doesn't manage threads itself, it's up to the webserver to create and manage them. In our case, we use simple ad-hoc code: https://github.com/dunglas/frankenphp/blob/323edefc4b7cb69bcb318745efac4dec9b7ac488/frankenphp.c#L810. It indeed only
This seems indeed a good fix. By the way, @withinboredom instigated #62130, and it appears that having more permissive checks fixes this issue too: dunglas/frankenphp#46 (comment) |
I'm hitting the same issue on Windows with go1.23.2 and go-gl/glfw in glfw.PoolEvents that calls to C code which can generate events consumed by callbacks in Go. It only happens if my project is build with @cherrymui Applying CL 600296 fixes the issue so far. |
Currently, at a cgo callback where there is already a Go frame on the stack (i.e. C->Go->C->Go), we require that at the inner Go callback the SP is within the g0's stack bounds set by a previous callback. This is to prevent that the C code switches stack while having a Go frame on the stack, which we don't really support. But this could also happen when we cannot get accurate stack bounds, e.g. when pthread_getattr_np is not available. Since the stack bounds are just estimates based on the current SP, if there are multiple C->Go callbacks with various stack depth, it is possible that the SP of a later callback falls out of a previous call's estimate. This leads to runtime throw in a seemingly reasonable program. This CL changes it to save the old g0 stack bounds at cgocallback, update the bounds, and restore the old bounds at return. So each callback will get its own stack bounds based on the current SP, and when it returns, the outer callback has the its old stack bounds restored. TODO: not sure if this works if inner callback panics and outer callback recovers. Maybe we need to do something in unwindm? TODO: do we do this only when we cannot get the accurate bounds, and still throw if we can? This makes it not throw if the C code switches stacks. What else could go wrong? For golang#68285. Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435
This reverts commit 7ea28b0. This commit was originally added to pin the golang build version to 1.20.7 because of an issue in golang related to CGO: golang/go#62130 (comment) This issue was fixed for most systems, although the issue is still open because it appears to have opened up a separate issue with MUSL-based systems (such as alpine): golang/go#68285 Since the fluent-bit container is on a glibc-based system (Amazon Linux), we should be safe to revert back to building with the latest version of golang.
This reverts commit 7ea28b0. This commit was originally added to pin the golang build version to 1.20.7 because of an issue in golang related to CGO: golang/go#62130 (comment) This issue was fixed for most systems, although the issue is still open because it appears to have opened up a separate issue with MUSL-based systems (such as alpine): golang/go#68285 Since the fluent-bit container is on a glibc-based system (Amazon Linux), we should be safe to revert back to building with the latest version of golang.
Ran into a similar issue with boosts fibers (from c++) where applying |
Change https://go.dev/cl/626275 mentions this issue: |
It is defined in bionic libc since at least API level 3. Use it. Updates #68285. Change-Id: I215c2d61d5612e7c0298b2cb69875690f8fbea66 Reviewed-on: https://go-review.googlesource.com/c/go/+/626275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Same observation on windows. DLL panics but EXE works. As a workaround I wrapped a call doing the Go -> C syscall in a goroutine and somehow it works. |
There's a bug in >= go1.22.5 that breaks gio apps on android: golang/go#68285 Pin to nixpkgs with go.1.22.4 until there is a released fix for this.
There's a bug in >= go1.22.5 that breaks gio apps on android: golang/go#68285 Pin to nixpkgs with go.1.22.3 until there is a released fix for this. The rev here has an existing hydra build available
There's a bug in >= go1.22.5 that breaks gio apps on android: golang/go#68285 Pin to nixpkgs with go.1.22.3 until there is a released fix for this. The rev here has an existing hydra build available
There's a bug in >= go1.22.5 that breaks gio apps on android: golang/go#68285 Pin to nixpkgs with go.1.22.3 until there is a released fix for this. The rev here has an existing hydra build available
There's a bug in >= go1.22.5 that breaks gio apps on android: golang/go#68285 Pin to nixpkgs with go.1.22.3 until there is a released fix for this. The rev here has an existing hydra build available
Change https://go.dev/cl/635775 mentions this issue: |
…t cgocallback Currently, at a cgo callback where there is already a Go frame on the stack (i.e. C->Go->C->Go), we require that at the inner Go callback the SP is within the g0's stack bounds set by a previous callback. This is to prevent that the C code switches stack while having a Go frame on the stack, which we don't really support. But this could also happen when we cannot get accurate stack bounds, e.g. when pthread_getattr_np is not available. Since the stack bounds are just estimates based on the current SP, if there are multiple C->Go callbacks with various stack depth, it is possible that the SP of a later callback falls out of a previous call's estimate. This leads to runtime throw in a seemingly reasonable program. This CL changes it to save the old g0 stack bounds at cgocallback, update the bounds, and restore the old bounds at return. So each callback will get its own stack bounds based on the current SP, and when it returns, the outer callback has the its old stack bounds restored. Also, at a cgo callback when there is no Go frame on the stack, we currently always get new stack bounds. We do this because if we can only get estimated bounds based on the SP, and the stack depth varies a lot between two C->Go calls, the previous estimates may be off and we fall out or nearly fall out of the previous bounds. But this causes a performance problem: the pthread API to get accurate stack bounds (pthread_getattr_np) is very slow when called on the main thread. Getting the stack bounds every time significantly slows down repeated C->Go calls on the main thread. This CL fixes it by "caching" the stack bounds if they are accurate. I.e. at the second time Go calls into C, if the previous stack bounds are accurate, and the current SP is in bounds, we can be sure it is the same stack and we don't need to update the bounds. This avoids the repeated calls to pthread_getattr_np. If we cannot get the accurate bounds, we continue to update the stack bounds based on the SP, and that operation is very cheap. On a Linux/AMD64 machine with glibc: name old time/op new time/op delta CgoCallbackMainThread-8 96.4µs ± 3% 0.1µs ± 2% -99.92% (p=0.000 n=10+9) Updates #68285. Updates #68587. Fixes #69988. Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435 Reviewed-on: https://go-review.googlesource.com/c/go/+/600296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> (cherry picked from commit 76a8409) Reviewed-on: https://go-review.googlesource.com/c/go/+/635775
…t cgocallback Currently, at a cgo callback where there is already a Go frame on the stack (i.e. C->Go->C->Go), we require that at the inner Go callback the SP is within the g0's stack bounds set by a previous callback. This is to prevent that the C code switches stack while having a Go frame on the stack, which we don't really support. But this could also happen when we cannot get accurate stack bounds, e.g. when pthread_getattr_np is not available. Since the stack bounds are just estimates based on the current SP, if there are multiple C->Go callbacks with various stack depth, it is possible that the SP of a later callback falls out of a previous call's estimate. This leads to runtime throw in a seemingly reasonable program. This CL changes it to save the old g0 stack bounds at cgocallback, update the bounds, and restore the old bounds at return. So each callback will get its own stack bounds based on the current SP, and when it returns, the outer callback has the its old stack bounds restored. Also, at a cgo callback when there is no Go frame on the stack, we currently always get new stack bounds. We do this because if we can only get estimated bounds based on the SP, and the stack depth varies a lot between two C->Go calls, the previous estimates may be off and we fall out or nearly fall out of the previous bounds. But this causes a performance problem: the pthread API to get accurate stack bounds (pthread_getattr_np) is very slow when called on the main thread. Getting the stack bounds every time significantly slows down repeated C->Go calls on the main thread. This CL fixes it by "caching" the stack bounds if they are accurate. I.e. at the second time Go calls into C, if the previous stack bounds are accurate, and the current SP is in bounds, we can be sure it is the same stack and we don't need to update the bounds. This avoids the repeated calls to pthread_getattr_np. If we cannot get the accurate bounds, we continue to update the stack bounds based on the SP, and that operation is very cheap. On a Linux/AMD64 machine with glibc: name old time/op new time/op delta CgoCallbackMainThread-8 96.4µs ± 3% 0.1µs ± 2% -99.92% (p=0.000 n=10+9) Updates golang#68285. Updates golang#68587. Fixes golang#69988. Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435 Reviewed-on: https://go-review.googlesource.com/c/go/+/600296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> (cherry picked from commit 76a8409) Reviewed-on: https://go-review.googlesource.com/c/go/+/635775
…t cgocallback Currently, at a cgo callback where there is already a Go frame on the stack (i.e. C->Go->C->Go), we require that at the inner Go callback the SP is within the g0's stack bounds set by a previous callback. This is to prevent that the C code switches stack while having a Go frame on the stack, which we don't really support. But this could also happen when we cannot get accurate stack bounds, e.g. when pthread_getattr_np is not available. Since the stack bounds are just estimates based on the current SP, if there are multiple C->Go callbacks with various stack depth, it is possible that the SP of a later callback falls out of a previous call's estimate. This leads to runtime throw in a seemingly reasonable program. This CL changes it to save the old g0 stack bounds at cgocallback, update the bounds, and restore the old bounds at return. So each callback will get its own stack bounds based on the current SP, and when it returns, the outer callback has the its old stack bounds restored. Also, at a cgo callback when there is no Go frame on the stack, we currently always get new stack bounds. We do this because if we can only get estimated bounds based on the SP, and the stack depth varies a lot between two C->Go calls, the previous estimates may be off and we fall out or nearly fall out of the previous bounds. But this causes a performance problem: the pthread API to get accurate stack bounds (pthread_getattr_np) is very slow when called on the main thread. Getting the stack bounds every time significantly slows down repeated C->Go calls on the main thread. This CL fixes it by "caching" the stack bounds if they are accurate. I.e. at the second time Go calls into C, if the previous stack bounds are accurate, and the current SP is in bounds, we can be sure it is the same stack and we don't need to update the bounds. This avoids the repeated calls to pthread_getattr_np. If we cannot get the accurate bounds, we continue to update the stack bounds based on the SP, and that operation is very cheap. On a Linux/AMD64 machine with glibc: name old time/op new time/op delta CgoCallbackMainThread-8 96.4µs ± 3% 0.1µs ± 2% -99.92% (p=0.000 n=10+9) Updates golang#68285. Updates golang#68587. Fixes golang#69988. Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435 Reviewed-on: https://go-review.googlesource.com/c/go/+/600296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> (cherry picked from commit 76a8409) Reviewed-on: https://go-review.googlesource.com/c/go/+/635775
Go version
go version go1.22.5 linux/arm64
Output of
go env
in your module/workspace:What did you do?
Reproducer:
What did you see happen?
Full stack trace: https://github.com/dunglas/frankenphp/actions/runs/9778553362/job/26995650580?pr=898
What did you expect to see?
No crash.
The test suite wasn't crashing with previous Go versions.
It also doesn't crash on Debian and macOS, only Alpine (musl?) is affected.
The project (FrankenPHP) heavily relies on cgo.
We change the default Alpine stack trace
-ldflags "-w -s -extldflags '-Wl,-z,stack-size=0x80000'
.The binary is compressed with UPX.
The text was updated successfully, but these errors were encountered: