Make SIGINT default stopsignal #714

thijs-obj · 2020-04-16T09:58:27Z

This is directly related to these issue that were (IMHO incorrectly) closed:

There is a misconception that sending "SIGTERM is more graceful". Generally that is true, however considering how Postgres interprets signals (https://www.postgresql.org/docs/11/server-shutdown.html), that simply doesn't guarantee to shutdown Postgres cleanly.

According to the docs the only way to guarantee a graceful shutdown without dataloss is using SIGINT.

Note that in a lot of production environments (e.g. in Kubernetes clusters) containers might be stopped at unexpected moments using the default signal, and in a lot of those environments connections to the server will not be closed by the clients in time (since they generally don't know the server is shutting down). Ergo, there are a lot of production setups in which the default image is not usable because it will lead to corrupted data.

I'd be happy to create a PR for this, but I have a feeling that some discussion is needed first.

The text was updated successfully, but these errors were encountered:

tianon · 2020-05-19T22:12:24Z

I'm confused, and I'm not sure we're reading the same documentation (or perhaps we're interpreting it differently).

From the link you've listed (https://www.postgresql.org/docs/11/server-shutdown.html):

SIGTERM

This is the Smart Shutdown mode. After receiving SIGTERM, the server disallows new connections, but lets existing sessions end their work normally. It shuts down only after all of the sessions terminate. If the server is in online backup mode, it additionally waits until online backup mode is no longer active. While backup mode is active, new connections will still be allowed, but only to superusers (this exception allows a superuser to connect to terminate online backup mode). If the server is in recovery when a smart shutdown is requested, recovery and streaming replication will be stopped only after all regular sessions have terminated.

vs:

SIGINT

This is the Fast Shutdown mode. The server disallows new connections and sends all existing server processes SIGTERM, which will cause them to abort their current transactions and exit promptly. It then waits for all server processes to exit and finally shuts down. If the server is in online backup mode, backup mode will be terminated, rendering the backup useless.

(emphasis mine, on both)

From my reading, SIGTERM does sound much more graceful than SIGINT (allows transactions to finish, allows backups to finish, etc), which is the reason we made the switch?

In my opinion, this makes more sense as documentation changes, because it sounds like what's happening is that SIGTERM is too graceful in some cases, but that which signal is "right" is usage dependent (for some users, SIGQUIT is going to be the right answer, which is why PostgreSQL supports all three). Additionally, many (most?) users will need to adjust what Docker calls the "stop timeout" such that PostgreSQL has more time to clean itself up gracefully, and unfortunately that's not something we can control or even influence in the Docker image.

borft · 2020-05-20T07:12:25Z

I see your point. SIGTERM is definitely more graceful, but only if it is allowed to complete.

If you run this container in a Kubernetes setup, it may not be able to complete, for example due to applications using persistent connections, or very long queries. This will result in Kubernetes killing the container altogether. This results in a much less graceful shutdown, and occasion can actually lead to data corruption.

Because SIGINT forces a faster shutdown (because it's more aggressive), the shutdown is always completed fast enough, which means that in the end, it's actually more robust, as data corruption doesn't occur, and shutdowns / restarts complete a lot faster.

Therefore one could argue that SIGINT is perhaps a better choice for the default behaviour. I do agree though, that it may differ per setup / application of the container.

tianon · 2020-05-20T07:35:53Z

I believe terminationGracePeriodSeconds is the k8s version of Docker's "stop timeout", and I think it should always be very generous for any database application (I usually set my databases as high as 120 seconds or more), to give plenty of time even if we have to flush to a slow disk.

borft · 2020-05-20T07:45:48Z

@tianon 2 minutes won't be enough if your application keeps database connections open, or if you have a connection pooler between your application and PG. The problem with SIGTERM is that it won't close connections, it will just wait indefinitely, I've observed this behaviour in a number of cases. Sometimes it helps to manually intervene, sometimes K8S does it for you (if you're not quick enough). Either case is not desirable I think.

thijs-obj · 2020-05-20T07:57:42Z

The parts you bolded out in the documentation for SIGINT like aborting transactions are bad, but these are all things the clients can and should handle. After all, transactions can fail for other reasons too, and backups can fail for all sorts of reasons.

The postgres container on the other hand cannot do anything to make sure it gets shut down properly with SIGTERM. There is nothing stopping clients from keeping connections open, and then if eventually the container gets SIGKILL, Postgres has no chance to clean anything up. This can lead to data-corruption. I think that the primary duty of the postgres container should be to make sure the postgres container gets shutdown properly, without any data loss. Whether transactions get stopped gracefully is a secondary concern imho. Also note that if the Postgres container gets SIGKILL, then all the bad things from SIGINT still happen, just x seconds later.

Now above I wrote that 'there is nothing stopping clients from keeping connections open'. Thing is: the clients don't even know that the postgres container has received SIGTERM and they should disconnect (in typical setups). Of course there are all sorts of notification mechanisms possible, but that would be incredibly complex, and then whether Postgres correctly saves its data before shutting down is still in the hands of the clients. Settings terminationGracePeriodSeconds to a long period doesn't really change that much if your clients use persistent connections, and again that still means the clients control whether the server can shutdown gracefully.

GameScripting · 2020-09-18T15:35:35Z

Is there anyone who thinks, this proposed change should NOT be made?
If not, what is the process to make this change happen?

--

I guess, as this is a breaking change, it must be documented (where?) and should only be applied to new versions of postgres e.g. the upcoming 13 release?

tianon · 2020-09-18T15:37:53Z

Is there anyone who thinks, this proposed change should NOT be made?

Yes, that's why this issue is still open and under discussion instead of closed/fixed.

GameScripting · 2020-09-18T15:58:22Z

@tianon alright, thank you for the quick response.

--

I would like to add another argument for changing the default stopsignal. I looked up what the default stopsignal is, when you install postgres under systemd. The offical postgres documentation suggests to use SIGINT as the systemd KillSignal: https://www.postgresql.org/docs/devel/server-start.html (in the example systemd unit file).

GameScripting · 2020-09-18T16:10:54Z

Also, in our case we have no control (as in: its not our app, we can not change the source code) over the application using the postgres database. We cant control whether or not the client application will close the db connections when the database server must shut down. The biggest problem is, that the client application has no way to know that the database server will be shut down. So the client has no way to stop whatever it is doing.

Our specific case:

The database was shut down, while the application was performing a huge database migration.
What we expected: Migration runs in a big transaction. As the database was shutdown mid-transaction, we expected the transaction to be aborted and rolled back
What happend: Data corruption, database could not start again. We had to restore from backup and try again.

tianon · 2020-09-18T17:59:56Z

I would like to add another argument for changing the default stopsignal. I looked up what the default stopsignal is, when you install postgres under systemd. The offical postgres documentation suggests to use SIGINT as the systemd KillSignal: https://www.postgresql.org/docs/devel/server-start.html (in the example systemd unit file).

This is the winning argument 😄
(I love to see an official upstream recommendation!)

The setting from the service file of the package we install also corroborates this:

$ docker pull postgres
Using default tag: latest
latest: Pulling from library/postgres
Digest: sha256:91462e8207eadf217fe72822163277d189215b1f3792719f29352de5beb0ad53
Status: Image is up to date for postgres:latest
docker.io/library/postgres:latest

$ docker run --rm postgres grep 'ExecStop' /lib/systemd/system/postgresql@.service
ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast %i stop

(That -m fast amounts to SIGINT, which is interesting that they explicitly overwrite the pg_ctlcluster default of -m smart 😅👍)

…postgres#714)

psorowka · 2022-02-24T15:13:10Z

As far as I see, this can be closed, because it has been already merged: bfc5d81

thijs-obj · 2022-02-24T15:24:03Z

Yes, this can be closed afaik.

Thanks for fixing this 👏

Wasn't sure whether the issue creator should close issues, so left this open. Will close now

See: - docker-library/postgres#714 Fixes #492

See docker-library/postgres#714 Fixes #492

wglambert added the Request Request for image modification or feature label Apr 16, 2020

GameScripting mentioned this issue Sep 18, 2020

Changed default stopsignal from SIGTERM to SIGINT #763

Merged

gcmalloc added a commit to enermaps/enermaps that referenced this issue Oct 29, 2020

make sigint the default signal for stopping psql (see docker-library/…

921ec1a

…postgres#714)

gcmalloc added a commit to enermaps/enermaps that referenced this issue Oct 29, 2020

make sigint the default signal for stopping psql (see docker-library/…

8bdd207

…postgres#714)

gcmalloc mentioned this issue Oct 29, 2020

Use SIGINT as signal to docker-compose stop db enermaps/enermaps#22

Merged

thijs-obj closed this as completed Feb 24, 2022

dgeissl mentioned this issue Jul 10, 2023

[bitnami/postgresql] Make SIGINT default stopsignal bitnami/containers#40519

Open

tianon mentioned this issue Mar 5, 2024

Stopping a PostgreSQL container immediately after starting it kills PostgreSQL with SIGKILL, possibly corrupting the database #1207

Open

saolof mentioned this issue Oct 1, 2024

SIGINT as the default stopsignal timescale/timescaledb-docker-ha#492

Closed

graveland added a commit to timescale/timescaledb-docker-ha that referenced this issue Oct 1, 2024

Set STOPSIGNAL to SIGINT

d0ee2f2

See: - docker-library/postgres#714 Fixes #492

graveland added a commit to timescale/timescaledb-docker-ha that referenced this issue Oct 1, 2024

Set STOPSIGNAL to SIGINT

55e42e5

See docker-library/postgres#714 Fixes #492

graveland mentioned this issue Oct 1, 2024

Set STOPSIGNAL to SIGINT timescale/timescaledb-docker-ha#493

Merged

graveland added a commit to timescale/timescaledb-docker-ha that referenced this issue Oct 2, 2024

Set STOPSIGNAL to SIGINT

5e15ac9

See docker-library/postgres#714 Fixes #492

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make SIGINT default stopsignal #714

Make SIGINT default stopsignal #714

thijs-obj commented Apr 16, 2020

tianon commented May 19, 2020

borft commented May 20, 2020

tianon commented May 20, 2020

borft commented May 20, 2020

thijs-obj commented May 20, 2020 •

edited

Loading

GameScripting commented Sep 18, 2020

tianon commented Sep 18, 2020

GameScripting commented Sep 18, 2020 •

edited

Loading

GameScripting commented Sep 18, 2020 •

edited

Loading

tianon commented Sep 18, 2020

psorowka commented Feb 24, 2022

thijs-obj commented Feb 24, 2022

Make SIGINT default stopsignal #714

Make SIGINT default stopsignal #714

Comments

thijs-obj commented Apr 16, 2020

tianon commented May 19, 2020

borft commented May 20, 2020

tianon commented May 20, 2020

borft commented May 20, 2020

thijs-obj commented May 20, 2020 • edited Loading

GameScripting commented Sep 18, 2020

tianon commented Sep 18, 2020

GameScripting commented Sep 18, 2020 • edited Loading

GameScripting commented Sep 18, 2020 • edited Loading

tianon commented Sep 18, 2020

psorowka commented Feb 24, 2022

thijs-obj commented Feb 24, 2022

thijs-obj commented May 20, 2020 •

edited

Loading

GameScripting commented Sep 18, 2020 •

edited

Loading

GameScripting commented Sep 18, 2020 •

edited

Loading