-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: traceback stuck in runtime.systemstack #55851
Comments
CC @golang/runtime |
I wouldn't prioritise this super high right now, it appears to be very rare (since it's only happened once in the past month fleet-wide for us, whereas we were seeing multiple crashes a day with #54332) |
Thanks for the information. The signal PC is the entry PC of
For a stop-gap solution I think we should not use the jumpstack logic if the SP delta is 0. I'll try sending a CL. (As with mentioned in a few other issues) using a per-PC SPWRITE marker may be a better fix. If we haven't written the SP, we haven't switched the stack so we can just unwind like a normal function. I think this may also make recursive |
Change https://go.dev/cl/437299 mentions this issue: |
Just saw another instance of this crash, go1.19.1. so, validates that it is infrequent but has now happened more than once. I can try backporting the patch into our runtime but it will take several months to know whether it worked ;)
|
|
Yeah, it probably make sense to backport, as it can cause runtime crashes, albeit rare. I'll make a CL. Thanks. |
@gopherbot please backport this to previous releases. This may cause runtime crashes. Thanks. |
Backport issue(s) opened: #56635 (for 1.18), #56636 (for 1.19). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/448516 mentions this issue: |
Change https://go.dev/cl/448517 mentions this issue: |
…emstack The traceback code has special "jump stack" logic, to trace back stack switches through systemstack. If we're at the entry of systemstack, the stack switch hasn't happened, so don't jump to user stack. The jump stack logic is only used if we're on the g0 stack. It can happen that we're at the entry of a recursive systemstack call on the g0 stack. In we jump stack here, there will be two problems: 1. There are frames between entering the g0 stack and this recursive systemstack call. Those frames will be lost. 2. Worse, we switched frame.sp but frame.fp calculation will use the entry SP delta (0), which will be wrong, which in turn leads wrong frame.lr and things will go off. For now, don't jump stack if we're at entry of systemstack (SP delta is 0). Using a per-PC SPWRITE marker may be a better fix. If we haven't written the SP, we haven't switched the stack so we can just unwind like a normal function. Updates #55851. Fixes #56636. Change-Id: I2b624c8c086b235b34d9c7d3cebd4a37264f00f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/437299 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> (cherry picked from commit 500bc6b) Reviewed-on: https://go-review.googlesource.com/c/go/+/448516
…emstack The traceback code has special "jump stack" logic, to trace back stack switches through systemstack. If we're at the entry of systemstack, the stack switch hasn't happened, so don't jump to user stack. The jump stack logic is only used if we're on the g0 stack. It can happen that we're at the entry of a recursive systemstack call on the g0 stack. In we jump stack here, there will be two problems: 1. There are frames between entering the g0 stack and this recursive systemstack call. Those frames will be lost. 2. Worse, we switched frame.sp but frame.fp calculation will use the entry SP delta (0), which will be wrong, which in turn leads wrong frame.lr and things will go off. For now, don't jump stack if we're at entry of systemstack (SP delta is 0). Using a per-PC SPWRITE marker may be a better fix. If we haven't written the SP, we haven't switched the stack so we can just unwind like a normal function. Updates #55851. Fixes #56635. Change-Id: I2b624c8c086b235b34d9c7d3cebd4a37264f00f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/437299 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> (cherry picked from commit 500bc6b) Reviewed-on: https://go-review.googlesource.com/c/go/+/448517
…emstack The traceback code has special "jump stack" logic, to trace back stack switches through systemstack. If we're at the entry of systemstack, the stack switch hasn't happened, so don't jump to user stack. The jump stack logic is only used if we're on the g0 stack. It can happen that we're at the entry of a recursive systemstack call on the g0 stack. In we jump stack here, there will be two problems: 1. There are frames between entering the g0 stack and this recursive systemstack call. Those frames will be lost. 2. Worse, we switched frame.sp but frame.fp calculation will use the entry SP delta (0), which will be wrong, which in turn leads wrong frame.lr and things will go off. For now, don't jump stack if we're at entry of systemstack (SP delta is 0). Using a per-PC SPWRITE marker may be a better fix. If we haven't written the SP, we haven't switched the stack so we can just unwind like a normal function. Updates golang#55851. Fixes golang#56636. Change-Id: I2b624c8c086b235b34d9c7d3cebd4a37264f00f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/437299 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> (cherry picked from commit 500bc6b) Reviewed-on: https://go-review.googlesource.com/c/go/+/448516
Sorry for the delay, we've just deployed go1.19.4 and will report back whether we continue to see crashing. We previously saw 7 crashes in 30 days, so if we go two weeks with no crashes that'll be pretty indicative. |
This also happens to us, even after upgrading to 1.19.5. It happens in various environments and components, around 30 times per week, with different stack frames each time. Everything is running on Ubuntu on arm64. Here are some samples:
|
Please open a new issue, the cause of your problem may be different. |
Similar to #54332
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, with go1.19.1
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Continuously profile for over one month.
https://github.com/golang/go/blob/go1.19.1/src/runtime/asm_arm64.s#L206
What did you expect to see?
No traceback stuck assert
What did you see instead?
Traceback stuck assert
The text was updated successfully, but these errors were encountered: