Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliability issues in remote-core http server #3228

Closed
2opremio opened this issue Nov 17, 2020 · 3 comments
Closed

Reliability issues in remote-core http server #3228

2opremio opened this issue Nov 17, 2020 · 3 comments

Comments

@2opremio
Copy link
Contributor

While chaos-monkey-testing captive-core in the remote http configuration I killed the core child process to simulate a crash.

Instead of respawning the core process, the captive core http server:

  1. Panicked due to the channel double-close (See backlog at the end of the description)
  2. Got stuck, which is more worrisome. Instead of dying, the captive core server process lingered, not giving an opportunity to the supervisor (Kubernetes in this case) to re-spawn it:
root@horizon-with-remote-core-6bddf785b9-r9vlg:/# killall stellar-core
root@horizon-with-remote-core-6bddf785b9-r9vlg:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1 20.5  1.5 1539976 32040 ?       Ssl  18:13  12:26 /captivecore --port=8080
root      7161  0.0  0.1  18504  3372 pts/0    Ss   19:11   0:00 bash
root      7220  0.0  0.1  34400  2792 pts/0    R+   19:14   0:00 ps aux
root@horizon-with-remote-core-6bddf785b9-r9vlg:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1 18.6  1.3 1539976 28016 ?       Ssl  18:13  12:27 /captivecore --port=8080
root      7161  0.0  0.1  18504  3372 pts/0    Ss   19:11   0:00 bash
root      7221  0.0  0.1  34400  2748 pts/0    R+   19:20   0:00 ps aux
root@horizon-with-remote-core-6bddf785b9-r9vlg:/#

Panic log:

Panic: close of closed channel
goroutine 418366 [running]:
runtime/debug.Stack(0x1f, 0x0, 0x0)
	/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/go-chi/chi/middleware.Recoverer.func1.1(0xc001431600, 0xf89860, 0xc001450380)
	/go/pkg/mod/github.com/go-chi/chi@v4.0.3+incompatible/middleware/recoverer.go:28 +0x1e3
panic(0xc70f60, 0xf61e60)
	/usr/local/go/src/runtime/panic.go:969 +0x166
github.com/stellar/go/ingest/ledgerbackend.(*stellarCoreRunner).close(0xc00011e0c0, 0x40ec08, 0x20)
	/go/src/github.com/stellar/go/ingest/ledgerbackend/stellar_core_runner.go:285 +0x192
github.com/stellar/go/ingest/ledgerbackend.(*CaptiveStellarCore).Close(0xc0003e8180, 0xc0014bffa0, 0x0)
	/go/src/github.com/stellar/go/ingest/ledgerbackend/captive_core_backend.go:502 +0x77
github.com/stellar/go/ingest/ledgerbackend.(*CaptiveStellarCore).GetLedger(0xc0003e8180, 0xc00004e6da, 0xb95222, 0xc0014bd980, 0xd70a63, 0xc, 0x0)
	/go/src/github.com/stellar/go/ingest/ledgerbackend/captive_core_backend.go:467 +0x552
github.com/stellar/go/exp/services/captivecore/internal.(*CaptiveCoreAPI).GetLedger(0xc0003ec040, 0x4e6da, 0xc001283e00, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/stellar/go/exp/services/captivecore/internal/api.go:170 +0xbf
github.com/stellar/go/exp/services/captivecore/internal.Handler.func2(0x7fafe80781b8, 0xc0000e2d00, 0xc001431800)
	/go/src/github.com/stellar/go/exp/services/captivecore/internal/server.go:54 +0x152
net/http.HandlerFunc.ServeHTTP(0xc0003ea520, 0x7fafe80781b8, 0xc0000e2d00, 0xc001431800)
	/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/go-chi/chi.(*Mux).routeHTTP(0xc0000bde00, 0x7fafe80781b8, 0xc0000e2d00, 0xc001431800)
	/go/pkg/mod/github.com/go-chi/chi@v4.0.3+incompatible/mux.go:425 +0x278
net/http.HandlerFunc.ServeHTTP(0xc0003ea510, 0x7fafe80781b8, 0xc0000e2d00, 0xc001431800)
	/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/stellar/go/support/http.LoggingMiddleware.func1(0xf89860, 0xc001450380, 0xc001431800)
	/go/src/github.com/stellar/go/support/http/logging_middleware.go:40 +0x392
net/http.HandlerFunc.ServeHTTP(0xc0003ca620, 0xf89860, 0xc001450380, 0xc001431700)
	/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/stellar/go/support/http.SetLoggerMiddleware.func1.1(0xf89860, 0xc001450380, 0xc001431600)
	/go/src/github.com/stellar/go/support/http/logging_middleware.go:20 +0x16c
net/http.HandlerFunc.ServeHTTP(0xc0003ca640, 0xf89860, 0xc001450380, 0xc001431600)
	/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/go-chi/chi/middleware.Recoverer.func1(0xf89860, 0xc001450380, 0xc001431600)
	/go/pkg/mod/github.com/go-chi/chi@v4.0.3+incompatible/middleware/recoverer.go:35 +0x83
net/http.HandlerFunc.ServeHTTP(0xc0003ca660, 0xf89860, 0xc001450380, 0xc001431600)
	/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/go-chi/chi/middleware.RequestID.func1(0xf89860, 0xc001450380, 0xc001431500)
	/go/pkg/mod/github.com/go-chi/chi@v4.0.3+incompatible/middleware/request_id.go:76 +0x1df
net/http.HandlerFunc.ServeHTTP(0xc0003ca680, 0xf89860, 0xc001450380, 0xc001431500)
	/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/go-chi/chi.(*Mux).ServeHTTP(0xc0000bde00, 0xf89860, 0xc001450380, 0xc001431400)
	/go/pkg/mod/github.com/go-chi/chi@v4.0.3+incompatible/mux.go:82 +0x2b2
net/http.serverHandler.ServeHTTP(0xc0001fa0e0, 0xf89860, 0xc001450380, 0xc001431400)
	/usr/local/go/src/net/http/server.go:2836 +0xa3
net/http.(*conn).serve(0xc001a28960, 0xf8d5a0, 0xc0003ed7c0)
	/usr/local/go/src/net/http/server.go:1924 +0x86c
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:2962 +0x35c
@bartekn
Copy link
Contributor

bartekn commented Nov 17, 2020

Can you try it again on master or release-horizon-v1.12.0? I believe I fixed it in: #3213 (at least close of closed channel issue).

@2opremio
Copy link
Contributor Author

Uhm, unfortunately remote-captive-core seems to be broken in master (b361462).

I will create a separate ticket for that.

@2opremio
Copy link
Contributor Author

Done: #3230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants