-
Notifications
You must be signed in to change notification settings - Fork 109
Feature request: don't die on panic #625
Comments
I remember had this functionality with previous versions of vitess server, maybe they implemented that in another way in newer versions. @kuba-- could you have a look? |
Looks, it's still there: https://github.com/src-d/go-vitess/blob/34b2260c9dc53f8eb12195c8eb7a2bd273e548c6/mysql/server.go#L257 But I'll take a look closer. |
After testing looks that recover still works fine. SELECT
r.repository_id, SUM(ARRAY_LENGTH(SPLIT(b.blob_content, '\n'))) AS lines_count
FROM refs r
NATURAL JOIN commit_blobs ct
NATURAL JOIN blobs b
GROUP BY r.repository_id
ORDER BY lines_count DESC mysql server panicked:
but the gitbase is still running. |
on this issue src-d/gitbase#705 , maybe the panic is thrown in another go routine? could you check if you are able to stop gitbase working introducing a panic on the go-git lru-cache? |
Yes, everything what goes on go routines (e.g. squash joins, go-git) may crash gitbase. This issue needs to be transferred to mysql project. |
Be careful about swallowing panics high up in the call stack. You may avert a process termination, but generally the heap is now in an unknown and generally invalid state. You are likely to resume only to crash again later much further away from the original problem, and it is very hard to debug these. As a rule, try to avoid handing panics that you didn't throw. If code indirects a nil pointer or reads off the end of a slice, the program is broken, and crashing is actually preferable. Otherwise the experience for the user is potentially much worse: It could lead to weird data corruptions, broken transactions, and other infelicities. I mention this because I've had to debug this, and it's much worse than the problem it is meant to solve IMO. |
As @creachadair says, I'd rather catch the panic earlier and have a proper stack trace than keep it running potentially incorrectly. |
We already have the main I'm not a big fan of Personally I would do nothing (so far), just integrate go-git with @jfontan 's fix: src-d/go-git#1076 and monitor situation if something like that happen again. |
So right now the problem is on partitions. I think if we panic in one of the partitions, we should stop the whole query. this will mimic the previous query behavior without parallelization. Problems related to inconsistencies should be minimal. WDYT? |
Using a panic to handle a controlled unwinding event—e.g., a parse or query that fails deep inside, and where the only thing you need to do is get control back to the top so you can report and resume—seems like a reasonable use. I'd generally advise defining your own panic signal type, e.g. type queryFailed string
…
// hypothetical interface, I have no idea what the real API looks like.
func query(ctx context.Context, q *Query) (_ *Result, err error) {
defer func() {
if x := recover(); x != nil {
if s, ok := x.(queryFailed); ok {
err = errors.New(string(s))
} else {
panic(x)
}
}
}()
return q.Run(ctx, new(Result)) // whatever
}
…
if badThing(happened) {
panic(queryFailed("they're destroying our city!"))
} or words to that effect. That way you only get the panic you care about. |
@creachadair - what you proposed, is what we already have (by vitess). |
As I understand it, Vitess made the same mistake that a lot of HTTP implementations do: It logs and swallows any panic, not just controlled ones.
Right, and that's specifically not the kind of panic I'm talking about. That's a program integrity violation, and swallowing that panic will leave the heap in a corrupt state with unknown future effects.
Obviously it's not my decision, but I would not recommend this approach at all. I don't know what the trigger for these panics is, but swallowing them isn't going to prevent the program from being broken, it will only change where it breaks (and perhaps make it less obvious). |
Personally I think in case of partition iterator it should be fine if we just close the iterator. The only concern I have is the fact that func foo() {
defer func() {
if x := recover(); x != nil {
fmt.Println("recover:", x)
syscall.Kill(syscall.Getpid(), syscall.SIGUSR2)
}
}()
panic("foo")
}
func fork() {
exe, _ := os.Executable()
dir := filepath.Dir(exe)
// Spawn child process.
os.StartProcess(exe, []string{exe}, &os.ProcAttr{
Dir: dir,
Files: []*os.File{
os.Stdin,
os.Stdout,
os.Stderr,
},
Sys: &syscall.SysProcAttr{},
})
}
func main() {
signalCh := make(chan os.Signal, 1)
signal.Notify(signalCh, syscall.SIGUSR2)
go foo()
for {
select {
case s := <-signalCh:
fork()
return
}
}
} Let me know what do you think. |
If you don't want to stop unwinding, you can just panic again in the handler: if x := recover(); x != nil {
cleanup(whatever)
panic(x) // resume panicking
} maybe that helps simplify things a bit? |
It would be nice to wrap query handler in
defer recover()
so in case a query panics it would return an error to the client instead of server exiting.Similar to how it usually works with http servers:
The text was updated successfully, but these errors were encountered: