Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix uWSGI configuration for clean shutdown #46

Merged
merged 1 commit into from
Jan 24, 2024
Merged

Fix uWSGI configuration for clean shutdown #46

merged 1 commit into from
Jan 24, 2024

Conversation

amCap1712
Copy link
Member

In absence of the exit-on-reload configuration option, uWSGI does not exit cleanly when consul-template attempts to restart it. This manifests as an error like: bind(): Address already in use [core/socket.c line 769]. Fix the configuration appropriately.

@mwiencek
Copy link
Member

mwiencek commented Nov 21, 2023

uWSGI does not exit cleanly when consul-template attempts to restart it

Just making sure I understand the patch correctly: why is consul-template attempting to restart the whole process?

AFAIK, when a template changes, consul-template should only send reload_signal, which is defined as SIGHUP in our configuration: https://github.com/metabrainz/artwork-redirect/blob/master/docker/prod/consul-template-redirect.conf

when uwsgi gets a sighup, quit completely and let runit restart us

Won't that cause requests to fail intermittently while uWSGI is restarting? uWSGI apparently supports graceful reloads via SIGHUP, and this would seem to bypass that.

@metabrainz metabrainz deleted a comment from brainzbot Nov 21, 2023
@amCap1712
Copy link
Member Author

There was a gap in my understanding and the terminology I used in the above explanation was a bit vague. Here's a clearer and more accurate explanation:

  1. When the consul-template finds that some dependent keys in consul kv store have been updated, it renders new template files.
  2. Once templates have been rendered, it issues reload signal to uWSGI.
  3. uWSGI responds to the signal and gracefully reloads the worker processes (note that this might take a while because uWSGI will wait for existing requests to be served before reloading subject to a max timeout).
  4. When uWSGI reloads its http workers, consul-template misguided believes that the uWSGI process has exited.
    Gracefully killing worker 5 (pid: 50)...
    Gracefully killing worker 9 (pid: 54)...
    2023-11-22T15:43:41.749Z [ERR] (cli) child process died with exit code -1
     ...gracefully killing workers...
    
    Note the 3rd line of the log, it has a different format because it is directly logged by consul-template. The remaining lines are being logged by uWSGI.
  5. Since consul-template is running in exec mode and believes that the child process it was managing has exited, it ends itself too.
  6. runit/systemd/systemv sees that consul-template has died so it restarts the consul-template process. This is evident from the following line of the log.
    2023/11/22 15:43:41 [DEBUG] (logging) enabling syslog on LOCAL0
    
    This is usually the first line logged when a container starts up (probably logged by consul-template).
  7. The newly started consul-template process starts a new uWSGI process that now competes with the existing uWSGI process for the port and errors out.
    probably another instance of uWSGI is running on the same address (0.0.0.0:13032).
    bind(): Address already in use [core/socket.c line 769]
    

This is fixed by adding exit-on-reload = true because then the old uWSGI process exits cleanly with consul-template and everything is restarted by runit.

However now I am curious why consul-template misleading believes that uWSGI has exited when it is reloading.

@amCap1712
Copy link
Member Author

For record, we figured why this was happening: https://chatlogs.metabrainz.org/libera/metabrainz/2023-11-22/?msg=5253309&page=1

amCap1712 added a commit to metabrainz/listenbrainz-server that referenced this pull request Nov 27, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error in LB
using `exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46
and have a better fix in mind now.

For the command field in `exec` block of consul template configuration, use
array format. This executes the command directly without spawning a shell wrapper
or setting a process group id. Both of which will interfere with forwarding
signals to the uWSGI process.
amCap1712 added a commit to metabrainz/critiquebrainz that referenced this pull request Nov 27, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error in LB
using `exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46 and have
a better fix in mind now.

For the command field in exec block of consul template configuration, use array format.
This executes the command directly without spawning a shell wrapper or setting a process
group id. Both of which will interfere with forwarding signals to the uWSGI process.
amCap1712 added a commit to metabrainz/critiquebrainz that referenced this pull request Nov 27, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error in LB
using `exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46 and have
a better fix in mind now.

For the command field in exec block of consul template configuration, use array format.
This executes the command directly without spawning a shell wrapper or setting a process
group id. Both of which will interfere with forwarding signals to the uWSGI process.
amCap1712 added a commit to metabrainz/critiquebrainz that referenced this pull request Nov 27, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error in LB
using `exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46 and have
a better fix in mind now.

For the command field in exec block of consul template configuration, use array format.
This executes the command directly without spawning a shell wrapper or setting a process
group id. Both of which will interfere with forwarding signals to the uWSGI process.
amCap1712 added a commit to metabrainz/critiquebrainz that referenced this pull request Nov 27, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error in LB
using `exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46 and have
a better fix in mind now.

For the command field in exec block of consul template configuration, use array format.
This executes the command directly without spawning a shell wrapper or setting a process
group id. Both of which will interfere with forwarding signals to the uWSGI process.
amCap1712 added a commit to metabrainz/listenbrainz-server that referenced this pull request Nov 27, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error in LB
using `exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46
and have a better fix in mind now.

For the command field in `exec` block of consul template configuration, use
array format. This executes the command directly without spawning a shell wrapper
or setting a process group id. Both of which will interfere with forwarding
signals to the uWSGI process.
amCap1712 added a commit to metabrainz/metabrainz.org that referenced this pull request Nov 27, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error using
`exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46 and have
a better fix in mind now.

For the command field in exec block of consul template configuration, use array
format. This executes the command directly without spawning a shell wrapper or
setting a process group id. Both of which will interfere with forwarding signals
to the uWSGI process.

This also needs a newer version of consul-template. So upgrading to a newer
base image.
amCap1712 added a commit to metabrainz/metabrainz.org that referenced this pull request Nov 28, 2023
We fixed the `bind(): Address already in use [core/socket.c line 769]` error using
`exit-on-reload = true` in uWSGI configuration. However, this prevents graceful
reloading.

I did some further investigation as noted in metabrainz/artwork-redirect#46 and have
a better fix in mind now.

For the command field in exec block of consul template configuration, use array
format. This executes the command directly without spawning a shell wrapper or
setting a process group id. Both of which will interfere with forwarding signals
to the uWSGI process.

This also needs a newer version of consul-template. So upgrading to a newer
base image.
@amCap1712 amCap1712 requested a review from mwiencek November 28, 2023 05:56
Copy link
Member

@mwiencek mwiencek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your detailed investigation into this!

docker/prod/uwsgi.ini Outdated Show resolved Hide resolved
As per investigation, we can remove exit-on-reload and use the array form of
the command to ensure proper signal handling on reloads.
@mwiencek mwiencek merged commit 32de201 into master Jan 24, 2024
1 check passed
@mwiencek mwiencek deleted the uwsgi-fix branch January 24, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants