Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can we run prolog/epilog/housekeeping in the flux systemd instance? #6619

Open
garlick opened this issue Feb 10, 2025 · 4 comments
Open

can we run prolog/epilog/housekeeping in the flux systemd instance? #6619

garlick opened this issue Feb 10, 2025 · 4 comments

Comments

@garlick
Copy link
Member

garlick commented Feb 10, 2025

Problem: in #6616, a connection to the system dbus is established and two systemd instances are monitored because we start jobs in the flux one and prolog/epilog/housekeeping in the system one.

Is there any reason we couldn't run them all in the flux systemd instance and avoid the extra dbus connection?

@garlick
Copy link
Member Author

garlick commented Feb 18, 2025

Just did an experiment with the following change

diff --git a/src/modules/job-manager/housekeeping.c b/src/modules/job-manager/housekeeping.c
index 82b376fc6..04faf76da 100644
--- a/src/modules/job-manager/housekeeping.c
+++ b/src/modules/job-manager/housekeeping.c
@@ -191,7 +191,7 @@ static struct allocation *allocation_create (struct housekeeping *hk,
                                                    allocation_timeout,
                                                    a))
         || !(a->bulk_exec = bulk_exec_create (&bulk_ops,
-                                              "rexec",
+                                              "sdexec",
                                               id,
                                               "housekeeping",
                                                a))
@@ -698,6 +698,8 @@ static void housekeeping_kill_cb (flux_t *h,
     while (a) {
         if (a->id == jobid || jobid == FLUX_JOBID_ANY) {
             if (a->bulk_exec) {
+                if (signum == SIGKILL)
+                    signum = SIGUSR1;
                 f = bulk_exec_kill (a->bulk_exec, ids, signum);
                 if (flux_future_then (f, -1, kill_continuation, hk) < 0)
                     flux_future_destroy (f);
@@ -817,6 +819,11 @@ static int housekeeping_parse_config (const flux_conf_t *conf,
             flux_cmd_destroy (cmd);
             return errprintf (error, "error duplicating IMP path");
         }
+        if (flux_cmd_setopt (cmd, "SDEXEC_PROP_KillMode", "process") < 0
+            || flux_cmd_setopt (cmd, "SDEXEC_PROP_SendSIGKILL", "off") < 0) {
+            flux_cmd_destroy (cmd);
+            return errprintf (error, "error setting sdexec options");
+        }
     }
 done:
     flux_cmd_destroy (hk->cmd);

and a config change

[run.housekeeping]
allowed-users = [ "flux" ]
allowed-environment = [ "FLUX_*" ]
# path = "/usr/lib/aarch64-linux-gnu/flux/cmd/flux-run-housekeeping"
path = "/etc/flux/system/housekeeping"

Somewhat shockingly, that worked pretty well!

● picl0
    State: running
    Units: 218 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Tue 2025-02-11 06:44:05 PST; 1 week 0 days ago
  systemd: 252.31-1~deb12u1
   CGroup: /user.slice/user-500.slice/user@500.service
           ├─app.slice
           │ └─housekeeping-0-fBJRuApn45M.service
           │   ├─749731 /usr/lib/aarch64-linux-gnu/flux/flux-imp run housekeeping
           │   ├─749733 /bin/sh /etc/flux/system/housekeeping
           │   ├─749740 /bin/sh /etc/flux/system/housekeeping.d/stress.sh
           │   ├─749741 stress -c 4 -t 60
           │   ├─749742 stress -c 4 -t 60
           │   ├─749743 stress -c 4 -t 60
           │   ├─749744 stress -c 4 -t 60
           │   └─749745 stress -c 4 -t 60
           ├─init.scope
           │ ├─1128 /lib/systemd/systemd --user
           │ └─1150 "(sd-pam)"
           └─session.slice
             └─dbus.service
               └─1283 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

The one thing that didn't work was the SIGUSR1 hack. I haven't dug into it yet but I guess SIGUSR1 isn't translated to SIGKILL in the IMP for the run case like it is for exec.

Also, since the unit file was what drained a node on failure, that would have to be added to housekeeping.

@garlick
Copy link
Member Author

garlick commented Feb 19, 2025

Hmm, it looks like signals (including the SIGUSR1 translation) are handled the same for both flux imp run and flux imp exec. What I'm observing when the IMP is signaled with SIGUSR1, it appears that the IMP exits and the transient unit is marked failed, but the other processes spawned for housekeeping keep going:

[  +3.112423] job-manager.debug[0]: Sending SIGUSR1 to IMP for job ƒjpXCZedGfVQ
[  +3.112560] sdexec.debug[0]: kill main housekeeping-0-fBJRyBTRMod.service (signal 10)
[  +3.125774] sdexec.debug[0]: housekeeping-0-fBJRyBTRMod.service: active.running
[  +3.125856] sdexec.debug[0]: housekeeping-0-fBJRyBTRMod.service: failed.failed
[  +3.125862] sdexec.debug[0]: reset-failed housekeeping-0-fBJRyBTRMod.service
[  +3.126582] sdexec.debug[0]: housekeeping-0-fBJRyBTRMod.service: inactive.dead
[ +39.879922] job-manager.info[0]: housekeeping: picl0 (rank 0) ƒBJRyBTRMod: stress: info: [912871] successful run completed in 60s
[ +39.879994] sdexec.debug[0]: unwatch housekeeping-0-fBJRyBTRMod.service
[ +39.880152] job-manager.err[0]: housekeeping: picl0 (rank 0) ƒBJRyBTRMod: Killed
[ +39.880184] job-manager.debug[0]: housekeeping: ƒBJRyBTRMod complete

Puzzling.

@garlick
Copy link
Member Author

garlick commented Feb 19, 2025

sdexec may be waiting for stdout which is held open by the lingering processes and that is why the job manager is not notified that the unit has ended until all the processes are gone. That is OK I guess. The question is why isn't the IMP delivering SIGKILL to those other processes and then why is it exiting. It must be shooting itself for some reason.

@garlick
Copy link
Member Author

garlick commented Feb 19, 2025

Yeah the IMP seems to have SIGKILLed itself.

Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Sent signal SIGUSR1 to main process 912861 (flux-imp) on client request.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Main process exited, code=killed, status=9/KILL
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Failed with result 'signal'.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912870 (stress.sh) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912871 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912872 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912873 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912874 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912875 (stress) remains running after unit stopped.

This change to the IMP fixes it.

diff --git a/src/imp/cgroup.c b/src/imp/cgroup.c
index 9375ec2..14e1a99 100644
--- a/src/imp/cgroup.c
+++ b/src/imp/cgroup.c
@@ -197,7 +197,8 @@ struct cgroup_info *cgroup_info_create (void)
     /* Note: GNU basename(3) never modifies its argument. (_GNU_SOURCE
      * is defined in config.h.)
      */
-    if (strncmp (basename (cgroup->path), "imp-shell", 9) == 0)
+    if (strncmp (basename (cgroup->path), "imp-shell", 9) == 0
+        || strncmp (basename (cgroup->path), "housekeeping", 12) == 0)
         cgroup->use_cgroup_kill = true;
 
     return cgroup;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant