Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale pidfiles not removed #11

Closed
kcgthb opened this issue Nov 21, 2018 · 4 comments · Fixed by #36
Closed

stale pidfiles not removed #11

kcgthb opened this issue Nov 21, 2018 · 4 comments · Fixed by #36
Assignees
Milestone

Comments

@kcgthb
Copy link
Contributor

kcgthb commented Nov 21, 2018

Hi!

I just noticed that if a user PUN goes away, its pidfile in /var/run/nginx/{%user%}/passenger.pid stays there, which causes 503 "Service unavailable" errors from the user's perspective.

I would have expected nginx_stage nginx_clean to remove those stale pidfiles, but that's not the case:

# /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean
stale pid file: /var/run/nginx/kilian/passenger.pid
# /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean
stale pid file: /var/run/nginx/kilian/passenger.pid

Is there a way to automatically remove those pidfiles, so the PUN could be restarted on-demand, or should I just implement a cron job to remove them?

This is with ondemand-1.4.3-2.el7.x86_64 on CentOS.

Thanks!

@kcgthb
Copy link
Contributor Author

kcgthb commented Nov 21, 2018

Quick update: this situation can happen if the nginx PUN process gets killed with SIGKILL: in that case the passenger.pidand passenger.sock files are not removed from /var/run/nginx/{%user%}/, which causes the 503 errors.

@ericfranz
Copy link
Contributor

Oh you are correct:

pid_path = PidFile.new NginxStage.pun_pid_path(user: u)
raise StalePidFile, "stale pid file: #{pid_path}" unless pid_path.running_process?

The nginx_clean command has two optional flags which imho are not well named:

nginx_stage nginx_clean --force # better to be named --all, kills all PUNs both inactive and active

nginx_stage nginx_clean --skip-nginx # better to be named --dry-run, because it only displays the inactive users whose PUNs would be shut down if `--skip-nginx` was omitted

So it seems like we should have another flag, something like --rm-stale-files to remove pid and sock files if the corresponding process is not found?

Even if that was the case, I think that from the user's perspective instead of seeing a 503 "Service unavailable error it would be preferable to see a more helpful error message with a way to rectify the problem, similar to the error page that users see when accessing an app for the first time (do you want to initialize this app?)

I guess if this was happening we would at least want to know that stale pid files were automatically deleted - like log a warning somewhere.

@ericfranz
Copy link
Contributor

Since you are using the rpms from the /latest/ I can make sure to include a fix for this before we generate a new version.

@kcgthb
Copy link
Contributor Author

kcgthb commented Nov 26, 2018

That'd be great, yes, thanks!
For now, I added another cron job that remove those leftover .{sock,pid} files, but a more integrated cleanup function within nginx_clean would be better indeed.

@ericfranz ericfranz added this to the OOD1.5.6 milestone Feb 26, 2019
@ericfranz ericfranz modified the milestones: OOD1.5.6, OOD1.6 Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants