Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service start / stop not working correctly #313

Closed
JackNewman12 opened this issue Nov 10, 2022 · 15 comments
Closed

Service start / stop not working correctly #313

JackNewman12 opened this issue Nov 10, 2022 · 15 comments

Comments

@JackNewman12
Copy link
Contributor

When attempting to stop or restart a service they can get into wonky states where finit doesn't seem to correctly stop or start them again.

Some side information:

  • All of our daemons run in the foreground

  • Our daemons create their own PID file as they start. The documentation seems to indicate finit should create/destroy PID files for the services automatically but if I make the daemon write to a different PID file finit seems to automatically detect this as displayed by the pid path in initctl status myservice. I attempted to use the forking mode and manually selecting the PID file but it does not seem to fix the issue below.

  • Most of the services that seem to have issues are using wrapper scripts:

#!/bin/bash

some-operation
another-operation
exec myservice

Sometimes certain services seem to have trouble starting and stopping.

... stopping the service ...
initctl stop myservice

finit: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():stop myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():             myservice(2416):  running disabled/clean   cond:on
finit: service_stop():Sending SIGTERM to pid:2416 name:jimsh
finit: Stopping myservice[2416], sending SIGTERM ...
finit: service_stop():kill(-2416, 15) => rc 0, errno 2
finit: svc_set_state():myservice is stopping, wait 3 sec before sending SIGKILL ...
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(2416): -> stopping
finit: service_step():             myservice(2416): stopping disabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000008
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2416
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
finit: sigchld_cb():Collected child 2416, status: 0
finit: service_monitor():collected myservice(2416), normal exit: 1, signaled: 0, exit code: 0
finit: Stopped myservice[2416]
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/myservice.pid
finit: service_retry():myservice crashed, trying to start it again, attempt 5
finit: Service myservice[2416] died, restarting in 2000 msec (5/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[2635]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(2635): ->  running
finit: service_step():             myservice(2635):  running  enabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000002
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2635
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready

... service is started again and is not up and running ...

The alternative is that after an application has crashed, or I have changed runlevels (which has the same not-stopping/starting bug), it seems like I can't get certain services to start again:

initctl restart myservice

init: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():start myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[4727]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(4727): ->  running
finit: service_step():             myservice(4727):  running  enabled/clean   cond:on
finit: sigchld_cb():Collected child 4727, status: 256
finit: service_monitor():collected myservice(4727), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 1
finit: Service myservice[1106] died, restarting in 2000 msec (1/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[4830]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(4830): ->  running
finit: service_step():             myservice(4830):  running  enabled/clean   cond:on
finit: sigchld_cb():Collected child 4830, status: 256
finit: service_monitor():collected myservice(4830), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 2
finit: Service myservice[1106] died, restarting in 2000 msec (2/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[4835]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(4835): ->  running
finit: service_step():             myservice(4835):  running  enabled/clean   cond:on
finit: sigchld_cb():Collected child 4835, status: 256
finit: service_monitor():collected myservice(4835), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 3
finit: Service myservice[1106] died, restarting in 2000 msec (3/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on

... finit will give up after 10 restarts ...

Running the application manually works perfectly. Watching ps while finit attempts to restart shows that nothing ever gets started.

This happens on both v4.3 with the restart fix patch and master. I can't remember if I saw this on the pure v4.3 branch

@troglobit
Copy link
Owner

Hmm, I'll look into the start/stop/restarting issue. Haven't seen anything like that myself yet, but I'll get back to you.

Re: PID files. By default, Finit expects services to maintain their own PID files. It monitors /run for new files (with some heuristics applied) to sniff new PID files and match them to the PID it recently forked off. PID files are actually the primary synchronization/readiness mechanism, which you've probably already seen traces of.

E.g., if service B depends on A you can set a condition like this:

service <pid/A> B

Only trouble is that service A must reassert (touch) it's PID file after the following:

initctl edit A
initctl reload    # Sends SIGHUP by default to A

Most daemons today don't rewrite/touch their PID file on SIGHUP, so PID condition A will then remain in "flux" and B will not be restarted properly. I don't know if this could be your problem, since you don't mention how your services are started by Finit, if you use any conditions for instance. But I thought it worth mentioning since I write so crappy docs.

Finit has support for creating PID files on its own, as well as figuring out stuff from forking daemons (like was the practise with SysV init scripts): pid:/path/to/pidfile. The syntax is described further in the documentation.

Since most daemons don't behave properly a few other standards for "readiness notification" were invented. For example, in systemd the application is expected to write READY=1\n on a socket to systemd after start and restart/reload. (For details, see issue #299 which was just added to Finit master for v4.4.)

@troglobit
Copy link
Owner

Sorry, cannot reproduce. I'm running Finit master (on myLinux) with the following config for my service:

root@anarchy:~# initctl show mdnsd
# Launch mDNS-SD daemon
service [2345789] /usr/sbin/mdnsd -sn -- mDNS-SD daemon

When I stop it I get the following trace. Worth noting is the log message mdnsd( 0): stopping disabled/clean cond:on, which tells me that Finit has received the message to stop the service. In your case above it still says 'enabled' which is quite puzzling:

root@anarchy:~# initctl stop mdnsd
finit[1]: api_cb():svc query: mdnsd
finit[1]: svc_parse_jobstr():Got str:'mdnsd'
finit[1]: svc_parse_jobstr():Got token:'mdnsd'
finit[1]: api_cb():stop mdnsd
finit[1]: svc_parse_jobstr():Got str:'mdnsd'
finit[1]: svc_parse_jobstr():Got token:'mdnsd'
finit[1]: service_step():               mdnsd(11528):  running disabled/clean   cond:on  
finit[1]: service_stop():Sending SIGTERM to pid:11528 name:mdnsd
finit[1]: Stopping mdnsd[11528], sending SIGTERM ...
finit[1]: service_stop():kill(-11528, 15) => rc 0, errno 2
finit[1]: svc_set_state():mdnsd is stopping, wait 3 sec before sending SIGKILL ...
finit[1]: cond_clear():service/mdnsd/
finit[1]: cond_clear_noupdate():service/mdnsd/
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ <= 0
finit[1]: service_step():               mdnsd(11528): -> stopping
finit[1]: service_step():               mdnsd(11528): stopping disabled/clean   cond:on  
root@anarchy:~# finit[1]: generic_io_cb():Calling I/O pidfile from runloop...
finit[1]: pidfile_update_conds():path: /run/mdnsd.pid, mask: 00000200
finit[1]: pidfile_update_conds():Found svc mdnsd for /run/mdnsd.pid with pid 11528
finit[1]: cond_clear():pid/mdnsd
finit[1]: cond_clear_noupdate():pid/mdnsd
finit[1]: cond_set_path():/run/finit/cond/pid/mdnsd <= 0
finit[1]: cond_update():pid/mdnsd
finit[1]: cond_set():service/mdnsd/ready
finit[1]: cond_set_noupdate():service/mdnsd/ready
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ready <= 2
finit[1]: cond_update():service/mdnsd/ready
finit[1]: sigchld_cb():Collected child 11528, status: 0
finit[1]: service_monitor():collected mdnsd(11528), normal exit: 1, signaled: 0, exit code: 0
finit[1]: service_step():               mdnsd(   0): stopping disabled/clean   cond:on  
finit[1]: service_step():mdnsd: stopped, cleaning up timers and conditions ...
finit[1]: cond_clear():pid/mdnsd
finit[1]: cond_clear_noupdate():pid/mdnsd
finit[1]: cond_set_path():/run/finit/cond/pid/mdnsd <= 0
finit[1]: cond_clear():service/mdnsd/
finit[1]: cond_clear_noupdate():service/mdnsd/
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ <= 0
finit[1]: cond_set_oneshot():service/mdnsd/stopped
finit[1]: cond_set_oneshot_noupdate():service/mdnsd/stopped => /run/finit/cond/service/mdnsd/stopped
finit[1]: cond_update():service/mdnsd/stopped
finit[1]: service_step():               mdnsd(   0): ->  stopped
finit[1]: service_step():               mdnsd(   0):  stopped disabled/clean   cond:on  
finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit[1]: cgroup_handle_event():event: '/sys/fs/cgroup/system/mdnsd/cgroup.events', mask: 00000002

When I start it up again I can see Finit registering my wish to enable it:

root@anarchy:~# initctl restart mdnsd
finit[1]: api_cb():svc query: mdnsd
finit[1]: svc_parse_jobstr():Got str:'mdnsd'
finit[1]: svc_parse_jobstr():Got token:'mdnsd'
finit[1]: api_cb():restart mdnsd
finit[1]: svc_parse_jobstr():Got str:'mdnsd'
finit[1]: svc_parse_jobstr():Got token:'mdnsd'
finit[1]: service_step():               mdnsd(   0):   halted  enabled/clean   cond:on  
finit[1]: cond_clear():service/mdnsd/
finit[1]: cond_clear_noupdate():service/mdnsd/
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ <= 0
finit[1]: service_step():               mdnsd(   0): ->  waiting
finit[1]: service_step():               mdnsd(   0):  waiting  enabled/clean   cond:on  
finit[1]: cond_clear():service/mdnsd/
finit[1]: cond_clear_noupdate():service/mdnsd/
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ <= 0
finit[1]: service_step():               mdnsd(   0): ->  UNKNOWN
finit[1]: service_step():               mdnsd(   0):  UNKNOWN  enabled/clean   cond:on  
finit[1]: cgroup_leaf_init():group system, name mdnsd, pid 11836, cfg 
finit[1]: group_init():path /sys/fs/cgroup/system/mdnsd, leaf 1, cfg 
finit[1]: iwatch_add():adding new watcher for path /sys/fs/cgroup/system/mdnsd/cgroup.events
finit[1]: Starting mdnsd[11836]
finit[1]: cond_clear():service/mdnsd/
finit[1]: cond_clear_noupdate():service/mdnsd/
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ <= 0
finit[1]: cond_set_oneshot():service/mdnsd/running
finit[1]: cond_set_oneshot_noupdate():service/mdnsd/running => /run/finit/cond/service/mdnsd/running
finit[1]: cond_update():service/mdnsd/running
finit[1]: service_step():               mdnsd(11836): ->  running
finit[1]: service_step():               mdnsd(11836):  running  enabled/clean   cond:on  
root@anarchy:~# finit[1]: generic_io_cb():Calling I/O pidfile from runloop...
finit[1]: pidfile_update_conds():path: /run/mdnsd.pid, mask: 00000002
finit[1]: pidfile_update_conds():Found svc mdnsd for /run/mdnsd.pid with pid 11836
finit[1]: pidfile_update_conds():Setting mdnsd PID file to /run/mdnsd.pid
finit[1]: cond_set():pid/mdnsd
finit[1]: cond_set_noupdate():pid/mdnsd
finit[1]: cond_set_path():/run/finit/cond/pid/mdnsd <= 2
finit[1]: cond_update():pid/mdnsd
finit[1]: cond_set():service/mdnsd/ready
finit[1]: cond_set_noupdate():service/mdnsd/ready
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ready <= 2
finit[1]: cond_update():service/mdnsd/ready
finit[1]: pidfile_update_conds():path: /run/mdnsd.pid, mask: 00000008
finit[1]: pidfile_update_conds():Found svc mdnsd for /run/mdnsd.pid with pid 11836
finit[1]: pidfile_update_conds():Setting mdnsd PID file to /run/mdnsd.pid
finit[1]: cond_set():pid/mdnsd
finit[1]: cond_set_noupdate():pid/mdnsd
finit[1]: cond_set_path():/run/finit/cond/pid/mdnsd <= 2
finit[1]: cond_set():service/mdnsd/ready
finit[1]: cond_set_noupdate():service/mdnsd/ready
finit[1]: cond_set_path():/run/finit/cond/service/mdnsd/ready <= 2

So ... unfortunately I cannot reproduce your issue, or even speculate further what it could be. Sorry :-/

@troglobit
Copy link
Owner

Ah, one thing that might be interesting for you, don't know if you've noticed it yourself, but in your restart case:

finit: service_monitor():collected myservice(4727), normal exit: 1, signaled: 0, exit code: 1

Your application exits with exit code 1, that's why it's constantly restarting.

@JackNewman12
Copy link
Contributor Author

JackNewman12 commented Nov 11, 2022

Sorry I should have stated my service file. It was really simple so I just brushed over it

service name:myservice /sbin/myservice

I didn't like that I saw what looks like a "pid file found" message as part of the stopping so I deleted the pidfile.so plugin from the disk. Now starting/stopping all of my services is reliable.

I am not sure if removing this plugin will break my service<->service dependencies.
What is the difference between <service/myservice> vs <pid/myservice>? I guess the former doesn't wait for the myservice to be "ready" but rather just starts the child service as soon as myservice is enabled.

For reference here is a good vs bad daemon log. The service files are identical service /sbin/XXX with different binaries.
To make the diff a little easier on the eyes I replaced the names and PIDs in the logs to match. (Does github support side-by-side diff markdown? It would make this so much easier to parse).

It seems the good service does a cond_set(), but the misbehaving service does a cond_clear() before the cond_set()

EDIT: Looking at the code I wonder if finit thinks these are forking services? Is that automatically detected? Either way none of these daemons are forking, just a simple mainloop with while(true).

Good:

init: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():stop myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():               myservice(2300):  running disabled/clean   cond:on
finit: service_stop():Sending SIGTERM to pid:2300 name:myservice
finit: Stopping myservice[2300], sending SIGTERM ...
finit: service_stop():kill(-2300, 15) => rc 0, errno 2
finit: svc_set_state():myservice is stopping, wait 3 sec before sending SIGKILL ...
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():               myservice(2300): -> stopping
finit: service_step():               myservice(2300): stopping disabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000008
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2300
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
finit: sigchld_cb():Collected child 2300, status: 0
finit: service_monitor():collected myservice(2300), normal exit: 1, signaled: 0, exit code: 0
finit: service_step():               myservice(   0): stopping disabled/clean   cond:on
finit: service_step():myservice: stopped, cleaning up timers and conditions ...
finit: cond_clear():pid/myservice
finit: cond_clear_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 0
finit: cond_update():pid/myservice
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/stopped
finit: cond_set_oneshot_noupdate():service/myservice/stopped => /var/run/finit/cond/service/myservice/stopped
finit: cond_update():service/myservice/stopped
finit: service_step():               myservice(   0): ->  stopped
finit: service_step():               myservice(   0):  stopped disabled/clean   cond:on
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/myservice.pid

Bad:

init: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():stop myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():               myservice(2300):  running disabled/clean   cond:on
finit: service_stop():Sending SIGTERM to pid:2300 name:myservice
finit: Stopping myservice[2300], sending SIGTERM ...
finit: service_stop():kill(-2300, 15) => rc 0, errno 2
finit: svc_set_state():myservice is stopping, wait 3 sec before sending SIGKILL ...
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():               myservice(2300): -> stopping
finit: service_step():               myservice(2300): stopping disabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000008
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2300
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
finit: sigchld_cb():Collected child 2300, status: 0
finit: service_monitor():collected myservice(2300), normal exit: 1, signaled: 0, exit code: 0
finit: Stopped myservice[2300]
finit: service_step():               myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/myservice.pid
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: api_cb():get runlevel
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: api_cb():get runlevel
finit: service_retry():myservice crashed, trying to start it again, attempt 3
finit: Service myservice[2300] died, restarting in 2000 msec (3/10)
finit: service_step():               myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():               myservice(   0): ->  waiting
finit: service_step():               myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():               myservice(   0): ->  UNKNOWN
finit: service_step():               myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[3689]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():               myservice(3689): ->  running
finit: service_step():               myservice(3689):  running  enabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000002
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 3689
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready

As for the restarting bug, I think you are correct and I just missed the fact the daemon was crashing due to me stopping some other services. Whoops.


To add to the fire, I didn't mention yesterday that I also see services get stuck in a "running" state (both with and without the pidfile plugin). I did a bit more work and I believe I have found the reason. If myservice is crashing and I call initctl stop myservice it gets into this incorrect state.

The service shows as "running" with a PID of 0. ps shows the service is not actually running.

# initctl status myservice
     Status : running
   Identity : myservice
Description : myservice
     Origin : /etc/init.d/myservice.conf
    Command : /sbin/myservice
   PID file : none
        PID : 0
       User : root
      Group : root
     Uptime : N/A
   Restarts : 1 (0/10)
  Runlevels : [--234-----]

Calling stop is basically a nop and the service is still listed as "running".

# initctl stop myservice
                                                                                                  finit: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():stop myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():              myservice(   0):  running disabled/clean   cond:on

# initctl status myservice
     Status : running
   Identity : myservice
Description : myservice
     Origin : /etc/init.d/myservice.conf
    Command : /sbin/myservice
   PID file : none
        PID : 0
       User : root
      Group : root
     Uptime : N/A
   Restarts : 1 (0/10)
  Runlevels : [--234-----]

How to recreate the state:

  1. myservice crashes, the respawn handler gets kicked into gear
finit: service_monitor():collected myservice(4293), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():              myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
  1. initctl stop myservice
init: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():stop myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():              myservice(   0):  stopped disabled/clean   cond:on
# initctl status myservice
     Status : stopped (code=exited, status=1/FAILURE)
   Identity : myservice
Description : myservice
     Origin : /etc/init.d/myservice.conf
    Command : /sbin/myservice
   PID file : none
        PID : 0
       User : root
      Group : root
     Uptime : N/A
   Restarts : 7 (6/10)
  Runlevels : [--234-----]
  1. The service restart callback gets triggered a few seconds later:
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: Successfully restarted crashing service myservice.
  1. finit now believes the application is "running"
# initctl status myservice
     Status : running
   Identity : myservice
Description : myservice
     Origin : /etc/init.d/myservice.conf
    Command : /sbin/myservice
   PID file : none
        PID : 0
       User : root
      Group : root
     Uptime : N/A
   Restarts : 7 (6/10)
  Runlevels : [--234-----]

So it looks like doing a service stop also needs cancel the retry timer.

@JackNewman12
Copy link
Contributor Author

Some more debugging later and sure enough in pidfile.c in pidfile_update_conds() the svc_is_forking() check is true.

No idea how that happens. I'll keep searching

@JackNewman12
Copy link
Contributor Author

So these daemons take some time to boot as they rely on certain things. I guess this causes finit to detect they are forking from pidfile.c doing a pid_file_set(svc, fn, 1);

I am happy with that, no harm done. Except the service_forked(svc); causes the state machine to reset.

With my very limited knowledge, I think one of these fixes the issue: (probably missing a bunch of edge cases I don't know about yet)

void service_forked(svc_t *svc)
 	/* Stop forking_retry() timer */
 	service_timeout_cancel(svc);
 
-	/* Not crashing, restore RUNNING state */
-	svc_unblock(svc);
-	svc_set_state(svc, SVC_RUNNING_STATE);
+	if (!svc_is_stopping(svc)) {
+		/* Not crashing, restore RUNNING state */
+		svc_unblock(svc);
+		svc_set_state(svc, SVC_RUNNING_STATE);
+	}
}

Although I guess the mask check earlier should be different. A closed pid file probably shouldn't trigger a svc_started(), service_forked() and service_ready() at all.

@@ -81,29 +81,37 @@ static void pidfile_update_conds(char *dir, char *name, uint32_t mask)
 
 	dbg("Found svc %s for %s with pid %d", svc->name, fn, svc->pid);
 	mkcond(svc, cond, sizeof(cond));
-	if (mask & (IN_CLOSE_WRITE | IN_ATTRIB | IN_MODIFY | IN_MOVED_TO)) {
+	if (mask & (IN_ATTRIB | IN_MODIFY | IN_MOVED_TO)) {
 		svc_started(svc);
 		if (!svc_has_pidfile(svc)) {
 			dbg("Setting %s PID file to %s", svc->name, fn);
 			pid_file_set(svc, fn, 1);
 		}

+ Add some sort of dedicated IN_CLOSE_WRITE  check later? Perhaps its just part of the IN_DELETE check?

@JackNewman12
Copy link
Contributor Author

To fix the restarting/stop race condition I threw in this quick fix:

--- a/user/init/src/api.c
+++ b/user/init/src/api.c
@@ -63,6 +63,7 @@ static int stop(svc_t *svc, void *user_data)
 	if (!svc)
 		return 1;
 
+	service_timeout_cancel(svc);
 	svc_stop(svc);
 	service_step(svc);

Although I wouldn't be surprised if you prefer this cancel logic to be inside the service_step() state machine. I just didn't want to try and wrap my head around it 😂

Let us know if I can help further and if you want me to make a pull request, or if you are happy to smash out some (probably much better) fixes.

@troglobit
Copy link
Owner

This is very interesting, I guess the results above are with latest Finit master? Anyway, I'll be going over this all today (CET) to see if I can reproduce, thank you! If you have any other input on smallest possible setup to reproduce, I'd be most grateful. I try to encode problems like these in test cases, many of them use the stupid test/common/serv.c daemon, which I'll use two of with different name:.. to reproduce.

Some quick comments/replies to your questions above:

  • There are two ways for Finit to detect if a service is forking, both are declarative, none of which you use so there's definitely something fishy going on:
    1. service type:forking ...
    2. serivce pid:!/wait/for/this/pid.file ...
  • Never seen anyone do a side-by-side diff in GitHub, that would indeed be very helpful :)
  • Cancelling any outstanding service timer on stop action sounds like a good idea!
  • "these daemons take some time to boot ..." <-- Finit should not care about that, you do you :)
  • The mask in pidfile_update_conds() is admittedly weird, but iirc it comes from how Quagga/FRR create/update PID files -- I'll include this too in the testing/reproduction
  • Re, pull-request: as long as you're OK with my name on the GIT commits I can handle it from here (if I manage to reproduce), I'll make sure to credit you (and Cohda Wireless?) in the ChangeLog of course.

@JackNewman12
Copy link
Contributor Author

I guess the results above are with latest Finit master?

Yes. f6b5b02

There are two ways for Finit to detect if a service is forking, both are declarative, none of which you use so there's definitely something fishy going on:

From my debug prints, it seems like:

  1. service_register() gets called. svc->forking == 0 and pidfile == NULL
  2. The service gets started
  3. pid_file_set(_,_,1) gets called from pidfile_update_conds() which sets the pid path to !/run/myservice.pid (TBD if /run/myservice.pid exists by now or not)
  4. service_register() gets called (again?) this time svc->forking == 0 but since the pidfile has been set to !/run/myservice.pid forking gets overridden to true.

The mask in pidfile_update_conds() is admittedly weird

Is this an issue? All of our daemons just close the pid file and leave it alone. I think init should clean these up IIRC.

Re, pull-request

Cheers. You can just credit me directly. I should probably update my github profile since I don't work at Cohda Wireless anymore 😅

I feel like a finit.debug would help, but I don't seem to get any extra prints. I know the rescue cmdline works so either finit is ignoring me or I need to add some extra verbosity flags :/

@troglobit
Copy link
Owner

I guess the results above are with latest Finit master?
Yes. f6b5b02

Great, thanks!

There are two ways for Finit to detect if a service is forking, both are declarative, none of which you use so there's definitely something fishy going on:
From my debug prints, it seems like:
1. service_register() gets called. svc->forking == 0 and pidfile == NULL
2. The service gets started
3. pid_file_set(_,_,1) gets called from pidfile_update_conds() which sets the pid path to !/run/myservice.pid (TBD if /run/myservice.pid exists by now or not)
4. service_register() gets called (again?) this time svc->forking == 0 but since the pidfile has been set to !/run/myservice.pid forking gets overridden to true.

Ah right, PID file detection is overloaded with the legacy forking option ... ouch. OK, I'll have a look at this, gives me a better idea also what to focus on when reproducing.

The service_register() function is called only at boot, runlevel change, and initctl reload.

The mask in pidfile_update_conds() is admittedly weird
Is this an issue? All of our daemons just close the pid file and leave it alone. I think init should clean these up IIRC.

What do you mean, clean up PID files? If an application creates them it's their responsibility, or their start script (SysV init).

Re, pull-request
Cheers. You can just credit me directly. I should probably update my github profile since I don't work at Cohda Wireless anymore sweat_smile

OK 😃

I feel like a finit.debug would help, but I don't seem to get any extra prints. I know the rescue cmdline works so either finit is ignoring me or I need to add some extra verbosity flags :/

Not following here. But just fyi, there are no more verbosity flags. -- finit.debug on kernel command line enables debugging at boot time and initctl debug toggles it at runtime.

@troglobit
Copy link
Owner

Status update: have reproduced both the running+pid:0 issue and the weird change from non-forking type to forking (pretty sure service_register() is the culprit for the latter).

Just to clarify, that pidfile.so sets the PID file is just the pidfile auto-discovery mechanism. So that we want to keep, since it's used by the process synchronization with <pid/foo> conditions.

I'll start working on a regression test now to trigger these bugs and then commit fixes to them. Your restart callback cancellation being one of them.

troglobit added a commit that referenced this issue Nov 11, 2022
First set of svc_t properties not possible to debug otherwise.  See
issue #313 for a background.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
troglobit added a commit that referenced this issue Nov 11, 2022
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
troglobit added a commit that referenced this issue Nov 11, 2022
When a service crashes Finit launches a restart timer that periodically
will try restart the service.  If a user calls `initctl stop foo` finit
must cancel this timer callback, otherwise we may end up with a weird
state where Finit thinkgs the service is running, but pid: 0, i.e., not
started.

Note: this fix has been expanded upon from the proof of concept
      submitted by Jack Newman.

Bug #313

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
troglobit added a commit that referenced this issue Nov 11, 2022
When a service starts up the pidfile.so plugin automatically detects any
new pidfiles that show up in the system.  If it finds a matching service
which does not have a custom configured pidfile, it auto-assigns the
detected pidfile to the service.

However, when the user later calls `initctl reload` to update the
running configuration set, Finit service_register() mistakenly will
set the service to forking since it now suddenly has a pidfile ...

The new style way of configuring a forking service is to set the

     type:forking

option.  The old compat way was to set `pid:!/path/to/pidfile`, so
in service_register() we must be very careful not to interpret the
registered pidfile as confiugred by the user -- i.e., check the pid
option.

Bug #313

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@troglobit
Copy link
Owner

troglobit commented Nov 11, 2022

There, regression test and fixes in place. Hope I nailed all your various issues down ... leaving this issue open until you've had the time to verify.

(Don't mind the failed CI run for ❌ 3b601d0, just me having messed up the chroot for tests in Azure. Works now as intended.)

@JackNewman12
Copy link
Contributor Author

Perfect! Looks like this issue is solved. Manually starting/stopping/crashing is all good and I can't fault it.
My handful of daemons that got the forking status are no longer getting that flag and also seem to be behaving.

🎉🎉 Thank you


Not sure if I should make a new ticket, but I found the same race condition with crashing and conditions.

service <service/parent/running> /bin/myservice
## or
service <pid/parent> /bin/myservice
  1. Stop parent service
  2. myservice crashes due to parent disappearing (I guess before finit has had a chance to stop it? TBD I think finit should probably kill "dependent" services before stopping the user requested one.)
  3. myservice retry timer starts
  4. initctl cond shows <service/parent/running> as off
  5. myservice timer gets hit and the service becomes running with pid of 0 again.

I can't seem to get this same race condition using the /usr feature. I guess the SIGHUP must cover it up.

If I do a initctl stop myservice; initctl start myservice then I correctly get the waiting status on myservice

The log:

finit: api_cb():svc query: parent
finit: svc_parse_jobstr():Got str:'parent'
finit: svc_parse_jobstr():Got token:'parent'
finit: api_cb():stop parent
finit: svc_parse_jobstr():Got str:'parent'
finit: svc_parse_jobstr():Got token:'parent'
finit: service_step():                 parent(7633):  running disabled/clean   cond:on
finit: service_stop():Sending SIGTERM to pid:7633 name:parent
finit: Stopping parent[7633], sending SIGTERM ...
finit: service_stop():kill(-7633, 15) => rc 0, errno 2
finit: svc_set_state():parent is stopping, wait 3 sec before sending SIGKILL ...
finit: cond_clear():service/parent/
finit: cond_clear_noupdate():service/parent/
finit: cond_set_path():/var/run/finit/cond/service/parent/ <= 0
finit: service_step():                 parent(7633): -> stopping
finit: service_step():                 parent(7633): stopping disabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/parent.pid, mask: 00000008
finit: pidfile_update_conds():Found svc parent for /var/run/parent.pid with pid 7633
finit: pidfile_update_conds():Setting parent PID file to /var/run/parent.pid
finit: cond_set():pid/parent
finit: cond_set_noupdate():pid/parent
finit: cond_set_path():/var/run/finit/cond/pid/parent <= 2
finit: cond_set():service/parent/ready
finit: cond_set_noupdate():service/parent/ready
finit: cond_set_path():/var/run/finit/cond/service/parent/ready <= 2
finit: cond_update():service/parent/ready
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000008
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 7634
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
finit: sigchld_cb():Collected child 7633, status: 256
finit: service_monitor():collected parent(7633), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():                 parent(   0): stopping disabled/clean   cond:on
finit: service_step():parent: stopped, cleaning up timers and conditions ...
finit: cond_clear():pid/parent
finit: cond_clear_noupdate():pid/parent
finit: cond_set_path():/var/run/finit/cond/pid/parent <= 0
finit: cond_update():pid/parent
finit: cond_clear():service/parent/
finit: cond_clear_noupdate():service/parent/
finit: cond_set_path():/var/run/finit/cond/service/parent/ <= 0
finit: cond_set_oneshot():service/parent/stopped
finit: cond_set_oneshot_noupdate():service/parent/stopped => /var/run/finit/cond/service/parent/stopped
finit: cond_update():service/parent/stopped
finit: service_step():                 parent(   0): ->  stopped
finit: service_step():                 parent(   0):  stopped disabled/clean   cond:on
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: sigchld_cb():Collected child 7634, status: 0
finit: service_monitor():collected myservice(7634), normal exit: 1, signaled: 0, exit code: 0
finit: service_step():               myservice(   0):  running  enabled/clean   cond:off
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/parent.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/parent.pid
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/myservice.pid
finit: service_retry():myservice crashed, trying to start it again, attempt 5
finit: Service myservice[7634] died, restarting in 2000 msec (5/10)
finit: service_step():               myservice(   0):   halted  enabled/clean   cond:off
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():               myservice(   0): ->  waiting
finit: service_step():               myservice(   0):  waiting  enabled/clean   cond:off
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: Successfully restarted crashing service myservice.

@troglobit
Copy link
Owner

Great to hear! :-)

Yes, please create a new issue with this issue. It's super hard to track different problems in the same issue. I agree with your suggestion in 2. above, and it'll likely solve the problem.

@troglobit
Copy link
Owner

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants