Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ctrlaltdel and 1 for stopit and reboot files #50

Merged
merged 3 commits into from
Jul 20, 2020

Conversation

ericonr
Copy link
Member

@ericonr ericonr commented Jul 13, 2020

Fixes #47

Since the halt program seems to use init directly, I don't think anything else needs to be changed.

Needs proper testing as well.

Ping @Duncaen and @leahneukirchen

@ahesford
Copy link
Member

The biggest problem, to me, is that runit/3 does chmod 100 /run/runit/reboot if that file exists, which has the effect of moving the reboot trigger from "only when executable" to "whenever the file is present". I completely overlooked that and wondered why runit seemed to be ignoring permissions until @ericonr pointed out the stage-three chmod.

There is no reason for /run/runit/reboot to exist in stage 1, so it doesn't not need to be created there. Likewise, if we are changing the default behavior of /run/runit/stopit to a no-op by removing the execute bit, that file doesn't need to be created in runit/1. Those files can be created in ctrlaltdel with permissions 0100.

Note that runit-init will create the stopit or reboot files with the right permissions, so shutdown and reboot will still work as expected using shutdown, halt (or its symlinks reboot and poweroff) or init (which is really what's called by halt).

I propose:

  1. Remove the two install lines in 1 that you currently have.
  2. Add two lines to ctrlaltdel:
install -m 0100 /dev/null /run/runit/stopit
install -m 0100 /dev/null /run/runit/reboot
  1. Remove the chmod 100 /run/runit/reboot in 3.

This should make our use of runit consistent with the documentation.

NOTE: If we're not creating /run/runit/stopit with permissions 0100 in stage 1, we should be certain that nothing depends on the default behavior of runit halting or rebooting the system when sent SIGCONT.

@ericonr
Copy link
Member Author

ericonr commented Jul 18, 2020

This is now WIP, I added commits with @ahesford's changes, will need to rebase them properly before merging. I agree with the suggestions regarding how we should do it.

NOTE: If we're not creating /run/runit/stopit with permissions 0100 in stage 1, we should be certain that nothing depends on the default behavior of runit halting or rebooting the system when sent SIGCONT.

grepping for SIGCONT and -CONT in void-runit and void-packages didn't yield anything of interest. I guess it could make sense to have an INSTALL.msg for it, though. I would expect a user who depends on this behavior to have an idea of what they are doing, but it's good to be sure. Something like "runit now requires 100 perms in /run/runit/reboot in order to reboot`.

Copy link
Member

@ahesford ahesford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fine. I definitely would not add an INSTALL.msg about this. It's bound to create confusion among the majority who are not relying on the existing permissions and, again, normal reboot/shutdown/ctrl-alt-del behavior should be unaffected.

Like you said, people relying on this configuration are probably knowledgeable enough to debug any changes.

ctrlaltdel Outdated Show resolved Hide resolved
ctrlaltdel Outdated Show resolved Hide resolved
Copy link
Member

@ahesford ahesford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good if they are squashed.

@ericonr
Copy link
Member Author

ericonr commented Jul 18, 2020

Squashed.

@Duncaen
Copy link
Member

Duncaen commented Jul 18, 2020

The upstream examples all seem to create at least the stopit file on boot. Not sure if there is a good reason, I can only think of issues with creating new files in some edge cases, but not sure how relevant this is for tmpfs /run, chmodding an existing file has less ways of failing.

@ericonr
Copy link
Member Author

ericonr commented Jul 19, 2020

So install stopit with 000 perms?

I would prefer it if both files got configured the same way, so if stopit is created during 1, so should reboot.

@ahesford
Copy link
Member

ahesford commented Jul 19, 2020

I would either create both stopit and reboot files with mode 0 in1 as originally proposed, or create neither as currently implemented in this branch. No need to create one ahead of time but leave the other to be created on demand.

Even though I advocated for the on-demand scheme you put together, I don't' have strong convictions about the choice.

The 100 permission in /run/runit/stopit made it so that signaling runit
with SIGCONT would shut the system down. To achieve the correct
behavior, we should create the stopit and reboot files with 000 perms,
and allow their permissions to be set correctly by calls to `init 0` or
`init 6` or by /etc/runit/ctrlaltdel.

1
runit uses the permissions in the /etc/runit/reboot file to determine
whether it's going to perform a halt or reboot action. This conditional
in 3 meant that touching the reboot file, even with 000 perms, would
lead to a reboot, which goes agains what is expected according to the
runit documentation.
Copy link
Member

@ahesford ahesford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should do the trick. @Duncaen, are you OK with this?

/run/runit/{stopit,reboot} should have their perms set to 100 by
ctrlaltdel, before runit signals itself with SIGCONT, in order to
trigger a reboot. This script can be changed by a user to perform
different actions. The comments in it explain what each file is for.
@ericonr
Copy link
Member Author

ericonr commented Jul 20, 2020

Fixed typo in commit message.

@ahesford ahesford merged commit 05c6e86 into void-linux:master Jul 20, 2020
@sbromberger
Copy link

sbromberger commented Dec 26, 2023

Sorry to resurrect this thread, but I'm wondering what the rationale was for changing stopit perms to 000 from 100. This breaks control of void containers when SIGCONT is used on stop commands. Is there a particular reason this change was made?

That is:

NOTE: If we're not creating /run/runit/stopit with permissions 0100 in stage 1, we should be certain that nothing depends on the default behavior of runit halting or rebooting the system when sent SIGCONT.

Management of containers depends on runit halting when sent SIGCONT (unless there's another way to do it that I'm missing....)

@leahneukirchen
Copy link
Member

SIGCONT is not the proper signal to shutdown a container. SIGINT should work as well.

Note that you need --cap-add=sys_boot for this to work.

@sbromberger
Copy link

sbromberger commented Jan 1, 2024

SIGCONT is not the proper signal to shutdown a container. SIGINT should work as well.

Do you have a reference for this? lxc/lxc-ci#408 discusses it in a bit of detail, and it was (at least when I started using LXD with void) the only way to get lxc stop to work properly.

@leahneukirchen
Copy link
Member

SIGINT will trigger a reboot, which results in a shutdown of a container. You can test this easily.

@sbromberger
Copy link

I just want to make sure we're talking about the same thing here: I'm not referring to Docker containers; I'm referring to incus containers. SIGINT indeed attempts a reboot with incus (but fails the reboot stage, so it has the effect of a shutdown, but with error logs). There is no --cap-add option for starting incus containers, which leads me to think you're talking about Docker.

@leahneukirchen
Copy link
Member

Can you share the error? If it terminates, it has the appropriate capabilities, else it would hang forever.

@sbromberger
Copy link

sbromberger commented Jan 1, 2024

[root@elemental ~]# incus start test
[root@elemental ~]# incus stop test
Error: Failed restarting instance: The instance is already running
Try `incus info --show-log test` for more info
...
lxc test 20240101184607.290 INFO     error - ../src/lxc/error.c:lxc_error_set_and_log:34 - Child <15487> ended on signal Hangup(1)
lxc test 20240101184607.310 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/local/bin/incusd callhook /var/lib/incus "default" "test" stopns" for container "test"
lxc test 20240101184607.429 INFO     conf - ../src/lxc/conf.c:lxc_map_ids:3603 - Caller maps host root. Writing mapping directly
lxc test 20240101184607.429 NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1368 - Dropped supplimentary groups
lxc test 20240101184607.455 INFO     conf - ../src/lxc/conf.c:run_script_argv:340 - Executing script "/usr/local/bin/incusd callhook /var/lib/incus "default" "test" stop" for container "test"
lxc test 20240101184607.513 ERROR    conf - ../src/lxc/conf.c:run_buffer:321 - Script exited with status 1
lxc test 20240101184607.513 ERROR    start - ../src/lxc/start.c:lxc_end:985 - Failed to run lxc.hook.post-stop for container "test"
lxc test 20240101184607.513 WARN     start - ../src/lxc/start.c:lxc_end:987 - Container will be stopped instead of rebooted
lxc test 20240101184607.514 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc test 20240101184607.514 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"

It's a standard incus image with the following config applied:

incus config set test raw.lxc="lxc.signal.halt=SIGINT"

@leahneukirchen
Copy link
Member

Indeed lxd doesn't seem to like that it tries to reboot. You can make ctrlaltdel not do chmod 100 /run/runit/reboot, then it works just as well as SIGCONT.

@sbromberger
Copy link

The way I currently do it is to chmod 100 /run/runit/stopit which allows me to use what I've used from the start; namely SIGCONT. I noticed that new images weren't shutting down properly, which is why I investigated and when I discovered this PR which is the source of the breakage. I was hoping that we might be able to revisit this particular PR so that things will "just work" properly again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong information in ctrlaltdel
5 participants