-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service restart delay not working #310
Comments
Interesting! I've been troubleshooting a very similar issue. What version of Finit are you using, or are you on the bleeding edge as myself? |
The latest release v4.3. I must note that I had to create the /var/run/finit/cond/usr folder myself to get usr conditions working. I wonder if something similar is happening with service folder. I'll have a peek today. Any thoughts as to where I should be looking or extra debug to enable? |
OK. There are changes related to this coming up in v4.4 that I hope to release soon. The usr conditions folder is created by the Before we rush ahead with going into the code. Maybe you can tell me a little bit more about MYSERVICE and what you expect Finit to do, and also what are you doing from the outside? Is it a forking daemon, is it expected to crash (I see the The particular code in question starts here: Lines 1970 to 1981 in f483a8a
|
Yes, although I would expect it to create that folder when required instead of failing. Adding the -create flag to the tool does not fix this. Either way adding a mkdir to a startup script is an easy fix.
This bug actually applies to all of our daemons. I've been attempting to switch our (very old) init system over the finit to give us some more powerful features. In this case you can imagine a simple daemon that just sits there and does some Tx/Rx data from Ethernet/Serial/Etc. No forking. It was only when I noticed an application was crashing (i.e. the network was down and we didn't handle it) all 10 retries would happen instantly.
For 99% of cases if the daemon is not running, restart it (with a small delay). Nothing too fancy going on, although with finit I expect we will add some extra conditions like network up/down, service to service dependency, etc.
In the example above I was just killing the daemon manually using I didn't have much time today but it seems to do what I would expect. A service_timeout(service_retry) gets set for X seconds in future but the bottom of the service_step() loop causes it to keep jumping into the next state running-> restart -> halted -> ready. Lines 2063 to 2067 in f483a8a
|
Thanks for the quick follow-up, was not expecting that! Very much appreciate some background and where-you're-at status 😃
OK, I'll look into the usr condition thing as a separate issue if I can reproduce it. The initial delay, like you saw, very short. But then it should work as you say ... the |
We are using 5.4 IIRC. Is there a list of these kernel features I need to enable? We run a pretty minimal build. Thanks for the support! |
That remains to be documented unfortunately, but all the new eventfd/signalfd/etc. and DEVTMPFS ... the best I can do meanwhile is to give you this: https://github.com/troglobit/myLinux/blob/main/board/amd64/linux_defconfig Finit can take care of bootstrapping a pretty bare rootfs using, e.g., the bootmisc.so plugin (many plugins started out as optional but are now mandatory, this is one such). There are some more pointers in myLinux if you're curious. and https://github.com/troglobit/myLinux/blob/main/board/common/busybox_defconfig, if you're on an embedded system with BusyBox. Notice BusyBox must not be built with |
I believe I have reproduced this issue now. There seems to be at least two cooperating bugs at play here. I'll try to get to the bottom of this over the weekend. |
There, finally found the root cause(s) of this! Thank you for taking the time to report it. |
Thanks for fixing the restart issue! I managed to test it today and it is working. If I kill an application it will restart instantly, however if I kill it again within a few seconds it will get the expected delay. |
Thank you for taking the time to test again! :) Yeah, this is by design. We don't want to penalize a single crash, which could be attributed to startup issues. The intention is to increment the delay if the service crashes continuously -- so we track the number of crashes (per some period of time). |
When a service is crashing I find that finit seems to restart the app instantly.
This means all 10 retries can end up being used in a few milliseconds and ends up disabling the service.
Interestingly I can still see the event timer is running as X seconds later finit will print the "Successfully restarted" line.
The conf file is very simple:
Sorry I didn't have timestamps on the log, hopefully my comments are clear enough. Tomorrow I should get a chance to recapture this with timestamps.
Thanks!
The text was updated successfully, but these errors were encountered: