Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shutdown hooks not running #338

Merged
merged 1 commit into from
Jan 9, 2023

Conversation

JackNewman12
Copy link
Contributor

The shutdown hooks do not have any registered callbacks. So calling plugin_run_hooks() ends up doing nothing.

I think it makes sense to handle these special case hook scripts like this, but not sure if skipping some of the extra checks in plugin_run_hooks() makes a difference. I am guessing we don't care about the cond_set() and service_step() calls when everything has already exited.

sig.c was calling plugin_run_hooks for the shutdown hooks, but the
shutdown hooks do not have registered callbacks in hook-scripts.c

Instead directly call plugin_script_run()
@troglobit
Copy link
Owner

Curious, when I recently implemented this I made sure to test all these cases, and it still works for me without your change. There's a piece of code to ensure scripts are called this late (or early at bootstrap):

finit/src/plugin.c

Lines 243 to 252 in 910dac5

void plugin_run_hook(hook_point_t no, void *arg)
{
plugin_t *p, *tmp;
#ifdef HAVE_HOOK_SCRIPTS_PLUGIN
if (!cond_is_available() && !plugloaded) {
dbg("conditions not available, calling script based hooks only!");
plugin_script_run(no);
}
#endif

I wanted to ensure all hooks are called using the same API, in case of future changes. Having different entry points is hard to manage over time.

However, I'm inclined to pull in your patch anyway. Perhaps even extend upon it to do the same for the early hook points as well, for symmetry. But first I'm very curious to understand why the current code doesn't work in your case -- feels like there's something else hidden here.

@JackNewman12
Copy link
Contributor Author

Just had a random thought about this and will have to check on Monday.
I bet I mindlessly added a /var/run/finit/cond/usr folder on the root filesystem as part of the build. I mkdir this folder at startup since finit didn't seem to create it for me and it is required to get usr conditions working (TBD if this is fixed now).

This probably goes unnoticed since /var gets mounted over it immediately (and then the startup scripts run) so everything continues as normal. Hence once all the filesystems are unmounted, that cond_is_available() check doesn't work.

If my hunch is correct it's probably a my-bad, although I'd argue that a bad folder structure, or failure to unmount /var shouldn't stop the shutdown scripts from running.

@troglobit
Copy link
Owner

troglobit commented Jan 8, 2023

Ah, yeah I think I ran into the same issue. Both usr and sys condition sub-directories were missing in some cases, seems to have been the root cause of the failing auto tests.

The /var/run (or just /run these days) will be given a ramdisk by Finit if it's not already mounted by the user in /etc/fstab. The order of the plugins that create all sub-directories is important, but the order had been lost. Commits 8bbae3d and c22d123 (yesterday) should have it covered. Sorry about that!

@JackNewman12
Copy link
Contributor Author

Ahh. So /var/run gets unmounted. /var fails to umount for whatever reason. I can't see anything in lsof and I removed all the kernel modules, TBD why this happens. If I do a umount /var/run before reboot the /var umount is successful. 🤷‍♂️ Beyond me.

If I manually umount /var/run I can see the underlying /var has a hook symlink?

# ls -lha /var/run/finit/cond/hook/mount/
lrwxrwxrwx    1      26 Jan  1  1970 post -> /var/run/finit/cond/reconf

(this is the only file in /var/run/**)

I am guessing this file is getting created before /var/run gets mounted and so it ends up in /var? Hence the cond_is_available() check is always true during a shutdown, even after unmounting /var/run. Is the mount() command blocking? Is this race condition between the mount and this symlink getting created?

Note adding the commits 8bbae3d and c22d123 mentioned above don't fix this.

I would guess

finit/src/plugin.c

Lines 247 to 252 in 02a110e

#ifdef HAVE_HOOK_SCRIPTS_PLUGIN
if (!cond_is_available() && !plugloaded) {
dbg("conditions not available, calling script based hooks only!");
plugin_script_run(no);
}
#endif
needs a return in it, or

finit/src/plugin.c

Lines 261 to 266 in 02a110e

/* Conditions are stored in /run, so don't try to signal
* conditions for any hooks before filesystems have been
* mounted. */
if (no >= HOOK_MOUNT_ERROR && no <= HOOK_SHUTDOWN)
cond_set_oneshot(hook_cond[no]);
needs to check if cond_is_available() so that this symlink doesn't get created by cond_set_oneshot().
But making those changes will stop /var/run getting mounted in the first place.

Its a bit above my paygrade now. But either way having a umount failure shouldn't stop the shutdown scripts from running.

@troglobit
Copy link
Owner

Awesome, that's exactly what I was looking for, thanks! Explains a lot that this happens on a system where /var/run is not a symlink to /run, all my own systems have migrated to a separate RAM disk on /run, which Finit mounts automatically if missing from /etc/fstab.

Yeah you're right, I'll have to look into those places, and build on this PR as well to improve robustness at shutdown, as well as boot (in case someone starts with a messed up rootsfs ...)

Thank you so much for helping find the root cause!

@troglobit troglobit merged commit 89772bf into troglobit:master Jan 9, 2023
troglobit added a commit that referenced this pull request Jan 9, 2023
As pointed out by Jack Newman in issue #338, when conditions are not
available (at boot before cond_init(), or shutdown when /var or /var/run
have been unmounted) we should not try to create any conditions.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@troglobit
Copy link
Owner

There, should work better now. I've adjusted the boundaries for calling cond_set_oneshot(), and also added a check for cond_is_available(), just like you suggested.

Turns out cond_set_oneshot() calls a couple of intermediate functions that in turn do a cond_checkpath(), which creates any missing directory, including /var/run/finit, if missing! So that was the root of it all.

Great to find and nail this one down before the next release!

@JackNewman12
Copy link
Contributor Author

Does that change work for you?
I made the same change locally and while it fixed the symlink issue it caused /var/run to never get mounted by finit. (It's not in my fstab)

@troglobit
Copy link
Owner

What does your filesystem layout look like? The recommended one is "new" style dedicated mount point for /run and a compat symlink back to it, like this /var/run -> ../run.

Finit only mounts /run, look here:

finit/src/finit.c

Lines 390 to 405 in 02a110e

/*
* Modern systems use tmpfs for /run. Fallback to /var/run if
* /run doesn't exist is handled by the bootmisc plugin. It
* also sets up compat symlinks.
*
* The unconditional mount of /run/lock is for DoS prevention.
* To override any of this behavior, add entries to /etc/fstab
* for /run (and optionally /run/lock).
*/
if (fisdir("/run") && !fismnt("/run")) {
fs_mount("tmpfs", "/run", "tmpfs", MS_NOSUID | MS_NODEV | MS_NOEXEC | MS_RELATIME, "mode=0755,size=10%");
/* This prevents user DoS of /run by filling /run/lock at the expense of another tmpfs, max 5MiB */
makedir("/run/lock", 1777);
fs_mount("tmpfs", "/run/lock", "tmpfs", MS_NOSUID | MS_NODEV | MS_NOEXEC | MS_RELATIME, "mode=0777,size=5252880");
}

@troglobit
Copy link
Owner

Updated the docs a bit to detail the fs recommendations. Hope it helps:

https://github.com/troglobit/finit/tree/master/doc#filesystem-layout

@JackNewman12
Copy link
Contributor Author

Found it. Looks like a few years ago whoever created the /var/run -> /run symlink actually created it as /run -> /var/run in our system. 🤦‍♂️
I guess having finit accidently create those /var/run/finit/cond ... directories was then letting it correctly mount /run later.

Swapped the symlink around and now everything mounts as expected. Cheers

@troglobit
Copy link
Owner

Nice find, and excellent news!

Yup, that sounds very likely from what I saw myself.

Glad to hear it all worked out 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants