Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when the number of replicas is bigger than max_wal_senders #176

Open
rugwirobaker opened this issue Mar 21, 2023 · 4 comments
Open
Labels
bug Something isn't working

Comments

@rugwirobaker
Copy link

rugwirobaker commented Mar 21, 2023

When you try to add one replica beyond max_wal_senders it panics on boot.

P2023-03-20T22:09:14Z app[21781770c63089] den [info]Provisioning standby
2023-03-20T22:09:15Z app[21781770c63089] den [info]repmgr -h fdaa:0:c688:a7b:d5a6:6646:f15a:2 -p 5433 -d repmgr -U repmgr -f /data/repmgr.conf standby clone -c -F
2023-03-20T22:09:22Z app[21781770c63089] den [info]panic: failed to clone primary: failed to clone primary: exit status 1
2023-03-20T22:09:22Z app[21781770c63089] den [info]goroutine 1 [running]:
2023-03-20T22:09:22Z app[21781770c63089] den [info]main.panicHandler({0x9a0c40?, 0xc0004303e0})
2023-03-20T22:09:22Z app[21781770c63089] den [info]	/go/src/github.com/fly-examples/fly-postgres/cmd/start/main.go:100 +0x55
2023-03-20T22:09:22Z app[21781770c63089] den [info]main.main()
2023-03-20T22:09:22Z app[21781770c63089] den [info]	/go/src/github.com/fly-examples/fly-postgres/cmd/start/main.go:34 +0xadd

We should handle this gracefully by logging an error or even return one to the user. Perhaps we should even automatically check the max_wal_senders setting and optionally update it before adding the new replica.

@rugwirobaker rugwirobaker added the bug Something isn't working label Mar 21, 2023
@davissp14
Copy link
Contributor

davissp14 commented Mar 21, 2023

This only impacts the new replica coming up, right?

@rugwirobaker
Copy link
Author

Yep only the new replica goes into a restart loop cause it can't pull rempgr from the primary.

@guillaumervls guillaumervls added the pg:resiliency Helps to strengthen Fly Postgres label Mar 30, 2023
@guillaumervls
Copy link

guillaumervls commented Apr 3, 2023

I'm adding the "not now" flag since this looks like a pain only when you have >10 replicas (by default, max_wal_senders is 10 right?)

Maybe this can be "temp fixed" by a note in the docs?

@guillaumervls guillaumervls added the not now Low priority: should be reassessed later label Apr 3, 2023
@rugwirobaker
Copy link
Author

yep, this only becomes an issue when you =>10 replicas.

@davissp14 davissp14 removed not now Low priority: should be reassessed later pg:resiliency Helps to strengthen Fly Postgres labels Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants