-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug when service is crashed and restarted initctl shows wrong pid #226
Comments
We extend the service.sh to emulate a well behaving service that creates and removes its own PID file. The new test emulates a crash by sending SIGKILL to service.sh. We then verify that Finit restarts it, and eventually registers the new PID when the services recreates the PID file. Issue #226 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
I'm afraid this must be your kernel again. I just added a new test¹ for this particular case and I cannot reproduce the problem. Allow me to explain a little more about the monitoring in Finit; when a well-behaved service (A) starts up in the foreground, Finit knows it's PID, but to be able to safely start any depending services (B and C) it waits for the service (A) to create it's PID file. Finit reads all PID files created in /run on every inotify event from the kernel. If it finds the PID it waits for, in the expected PID file, the service's (A) pid condition is asserted. Hence, if inotify is not working properly that mechanism is broken. There may be unexpected behavior/artifacts in internal structs when this occurs, e.g. wrong PID shown etc. __ |
At startup (and reconf) of systems with lots of services there is a risk of losing inotify events, e.g., PID file creation/delete events. This patch increase the receive buffer (doubles it). On Linux the getsockopt() for SO_RCVBUF returns double the set size, due to housekeeping in the kernel. So we don't have to do any adjustments when setting it. Issue #226 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
So, I have to retract my previous statement ... I got a very similar report (privately) from a client. They had spotted a behavior just as you described, but with dnsmasq, when reconfiguring their system at runtime. Finit refused to restart the dying service, hanging on to its old PID. I've been attempting to recreate this problem using the test case I mentioned previously; I've had several theories over the last few weeks, but none have really panned out until this morning when I managed to enable logging in a reasonable way and found -- that Finit does indeed detect the PID crashing (so signals aren't lost), but it thinks the process is a forking service (sysv start script) and exits early waiting for the daemon/script to create its PID file ... Tweaking the classification of what is a forking service seems to be the solution. I've now rerun the test (100000 laps) twice without a problem! So I'll be adding some more tests to also verify forking services with this tweak, but it's looking very promising. Thanks for reporting this, and sorry for my being so dismissive earlier! |
No problem, thank you so much for further investigating this. I too thought it may just been some strange bug in earlier inotify implementations on my kernel. I did think it was a bit strange is I have used another inotify daemon implementation on my system and with the tiny patch I mentioned in another thread, I'm able to have (I think) very reliable pid detection including delete, update, create etc so that's why I was a bit confused. |
The slay script checks for common errors and gives some logs and status of Finit when something goes wrong. Helps detect issue #226 when the start-kill-service.sh test runs at 100000 laps. Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
I have the following service:
Using
initctl status myapp:media
gives the correct result.However, if I kill the process using the pid provided above (I am simulating a crash), then the app is restarted by finit. However after the app is restarted, the pid in initctl status is not updated.
It's correct in the pid file because that's managed by my service, it just seems like it doesn't reread that (or update the internal db) when restarting the service.
This is a problem for me because I'm using my workaround command in #225 to send signals and I would prefer to have initctl tell me the correct pid than to read the pid file in my own script because then I have to have knowledge of what the pid file name is.
The text was updated successfully, but these errors were encountered: