stale pidfiles not removed #11

kcgthb · 2018-11-21T22:34:54Z

Hi!

I just noticed that if a user PUN goes away, its pidfile in /var/run/nginx/{%user%}/passenger.pid stays there, which causes 503 "Service unavailable" errors from the user's perspective.

I would have expected nginx_stage nginx_clean to remove those stale pidfiles, but that's not the case:

# /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean
stale pid file: /var/run/nginx/kilian/passenger.pid
# /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean
stale pid file: /var/run/nginx/kilian/passenger.pid

Is there a way to automatically remove those pidfiles, so the PUN could be restarted on-demand, or should I just implement a cron job to remove them?

This is with ondemand-1.4.3-2.el7.x86_64 on CentOS.

Thanks!

The text was updated successfully, but these errors were encountered:

kcgthb · 2018-11-21T23:00:51Z

Quick update: this situation can happen if the nginx PUN process gets killed with SIGKILL: in that case the passenger.pidand passenger.sock files are not removed from /var/run/nginx/{%user%}/, which causes the 503 errors.

ericfranz · 2018-11-22T16:52:53Z

Oh you are correct:

ondemand/nginx_stage/lib/nginx_stage/generators/nginx_clean_generator.rb

Lines 45 to 46 in 01731b7

    
           pid_path = PidFile.new NginxStage.pun_pid_path(user: u) 
        
           raise StalePidFile, "stale pid file: #{pid_path}" unless pid_path.running_process?

The nginx_clean command has two optional flags which imho are not well named:

nginx_stage nginx_clean --force # better to be named --all, kills all PUNs both inactive and active

nginx_stage nginx_clean --skip-nginx # better to be named --dry-run, because it only displays the inactive users whose PUNs would be shut down if `--skip-nginx` was omitted

So it seems like we should have another flag, something like --rm-stale-files to remove pid and sock files if the corresponding process is not found?

Even if that was the case, I think that from the user's perspective instead of seeing a 503 "Service unavailable error it would be preferable to see a more helpful error message with a way to rectify the problem, similar to the error page that users see when accessing an app for the first time (do you want to initialize this app?)

I guess if this was happening we would at least want to know that stale pid files were automatically deleted - like log a warning somewhere.

ericfranz · 2018-11-22T16:53:33Z

Since you are using the rpms from the /latest/ I can make sure to include a fix for this before we generate a new version.

kcgthb · 2018-11-26T18:47:17Z

That'd be great, yes, thanks!
For now, I added another cron job that remove those leftover .{sock,pid} files, but a more integrated cleanup function within nginx_clean would be better indeed.

ericfranz self-assigned this Nov 22, 2018

ericfranz added the community request label Nov 22, 2018

ericfranz added this to the OOD1.5.6 milestone Feb 26, 2019

ericfranz modified the milestones: OOD1.5.6, OOD1.6 Mar 25, 2019

MorganRodgers mentioned this issue Apr 30, 2019

Remove stale pid and socket files #36

Merged

ericfranz closed this as completed in #36 May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stale pidfiles not removed #11

stale pidfiles not removed #11

kcgthb commented Nov 21, 2018

kcgthb commented Nov 21, 2018

ericfranz commented Nov 22, 2018

ericfranz commented Nov 22, 2018

kcgthb commented Nov 26, 2018

stale pidfiles not removed #11

stale pidfiles not removed #11

Comments

kcgthb commented Nov 21, 2018

kcgthb commented Nov 21, 2018

ericfranz commented Nov 22, 2018

ericfranz commented Nov 22, 2018

kcgthb commented Nov 26, 2018