can we run prolog/epilog/housekeeping in the flux systemd instance? #6619

garlick · 2025-02-10T14:20:17Z

Problem: in #6616, a connection to the system dbus is established and two systemd instances are monitored because we start jobs in the flux one and prolog/epilog/housekeeping in the system one.

Is there any reason we couldn't run them all in the flux systemd instance and avoid the extra dbus connection?

garlick · 2025-02-18T22:30:21Z

Just did an experiment with the following change

diff --git a/src/modules/job-manager/housekeeping.c b/src/modules/job-manager/housekeeping.c
index 82b376fc6..04faf76da 100644
--- a/src/modules/job-manager/housekeeping.c
+++ b/src/modules/job-manager/housekeeping.c
@@ -191,7 +191,7 @@ static struct allocation *allocation_create (struct housekeeping *hk,
                                                    allocation_timeout,
                                                    a))
         || !(a->bulk_exec = bulk_exec_create (&bulk_ops,
-                                              "rexec",
+                                              "sdexec",
                                               id,
                                               "housekeeping",
                                                a))
@@ -698,6 +698,8 @@ static void housekeeping_kill_cb (flux_t *h,
     while (a) {
         if (a->id == jobid || jobid == FLUX_JOBID_ANY) {
             if (a->bulk_exec) {
+                if (signum == SIGKILL)
+                    signum = SIGUSR1;
                 f = bulk_exec_kill (a->bulk_exec, ids, signum);
                 if (flux_future_then (f, -1, kill_continuation, hk) < 0)
                     flux_future_destroy (f);
@@ -817,6 +819,11 @@ static int housekeeping_parse_config (const flux_conf_t *conf,
             flux_cmd_destroy (cmd);
             return errprintf (error, "error duplicating IMP path");
         }
+        if (flux_cmd_setopt (cmd, "SDEXEC_PROP_KillMode", "process") < 0
+            || flux_cmd_setopt (cmd, "SDEXEC_PROP_SendSIGKILL", "off") < 0) {
+            flux_cmd_destroy (cmd);
+            return errprintf (error, "error setting sdexec options");
+        }
     }
 done:
     flux_cmd_destroy (hk->cmd);

and a config change

[run.housekeeping]
allowed-users = [ "flux" ]
allowed-environment = [ "FLUX_*" ]
# path = "/usr/lib/aarch64-linux-gnu/flux/cmd/flux-run-housekeeping"
path = "/etc/flux/system/housekeeping"

Somewhat shockingly, that worked pretty well!

● picl0
    State: running
    Units: 218 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Tue 2025-02-11 06:44:05 PST; 1 week 0 days ago
  systemd: 252.31-1~deb12u1
   CGroup: /user.slice/user-500.slice/user@500.service
           ├─app.slice
           │ └─housekeeping-0-fBJRuApn45M.service
           │   ├─749731 /usr/lib/aarch64-linux-gnu/flux/flux-imp run housekeeping
           │   ├─749733 /bin/sh /etc/flux/system/housekeeping
           │   ├─749740 /bin/sh /etc/flux/system/housekeeping.d/stress.sh
           │   ├─749741 stress -c 4 -t 60
           │   ├─749742 stress -c 4 -t 60
           │   ├─749743 stress -c 4 -t 60
           │   ├─749744 stress -c 4 -t 60
           │   └─749745 stress -c 4 -t 60
           ├─init.scope
           │ ├─1128 /lib/systemd/systemd --user
           │ └─1150 "(sd-pam)"
           └─session.slice
             └─dbus.service
               └─1283 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

The one thing that didn't work was the SIGUSR1 hack. I haven't dug into it yet but I guess SIGUSR1 isn't translated to SIGKILL in the IMP for the run case like it is for exec.

Also, since the unit file was what drained a node on failure, that would have to be added to housekeeping.

garlick · 2025-02-19T01:48:39Z

Hmm, it looks like signals (including the SIGUSR1 translation) are handled the same for both flux imp run and flux imp exec. What I'm observing when the IMP is signaled with SIGUSR1, it appears that the IMP exits and the transient unit is marked failed, but the other processes spawned for housekeeping keep going:

[  +3.112423] job-manager.debug[0]: Sending SIGUSR1 to IMP for job ƒjpXCZedGfVQ
[  +3.112560] sdexec.debug[0]: kill main housekeeping-0-fBJRyBTRMod.service (signal 10)
[  +3.125774] sdexec.debug[0]: housekeeping-0-fBJRyBTRMod.service: active.running
[  +3.125856] sdexec.debug[0]: housekeeping-0-fBJRyBTRMod.service: failed.failed
[  +3.125862] sdexec.debug[0]: reset-failed housekeeping-0-fBJRyBTRMod.service
[  +3.126582] sdexec.debug[0]: housekeeping-0-fBJRyBTRMod.service: inactive.dead
[ +39.879922] job-manager.info[0]: housekeeping: picl0 (rank 0) ƒBJRyBTRMod: stress: info: [912871] successful run completed in 60s
[ +39.879994] sdexec.debug[0]: unwatch housekeeping-0-fBJRyBTRMod.service
[ +39.880152] job-manager.err[0]: housekeeping: picl0 (rank 0) ƒBJRyBTRMod: Killed
[ +39.880184] job-manager.debug[0]: housekeeping: ƒBJRyBTRMod complete

Puzzling.

garlick · 2025-02-19T01:51:11Z

sdexec may be waiting for stdout which is held open by the lingering processes and that is why the job manager is not notified that the unit has ended until all the processes are gone. That is OK I guess. The question is why isn't the IMP delivering SIGKILL to those other processes and then why is it exiting. It must be shooting itself for some reason.

garlick · 2025-02-19T14:00:58Z

Yeah the IMP seems to have SIGKILLed itself.

Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Sent signal SIGUSR1 to main process 912861 (flux-imp) on client request.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Main process exited, code=killed, status=9/KILL
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Failed with result 'signal'.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912870 (stress.sh) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912871 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912872 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912873 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912874 (stress) remains running after unit stopped.
Feb 18 15:51:07 picl0 systemd[1128]: housekeeping-0-fBJRyBTRMod.service: Unit process 912875 (stress) remains running after unit stopped.

This change to the IMP fixes it.

diff --git a/src/imp/cgroup.c b/src/imp/cgroup.c
index 9375ec2..14e1a99 100644
--- a/src/imp/cgroup.c
+++ b/src/imp/cgroup.c
@@ -197,7 +197,8 @@ struct cgroup_info *cgroup_info_create (void)
     /* Note: GNU basename(3) never modifies its argument. (_GNU_SOURCE
      * is defined in config.h.)
      */
-    if (strncmp (basename (cgroup->path), "imp-shell", 9) == 0)
+    if (strncmp (basename (cgroup->path), "imp-shell", 9) == 0
+        || strncmp (basename (cgroup->path), "housekeeping", 12) == 0)
         cgroup->use_cgroup_kill = true;
 
     return cgroup;

This was referenced Feb 19, 2025

imp: forward signals to cgroup for housekeeping, prolog, epilog flux-framework/flux-security#199

Closed

WIP: run housekeeping, prolog, epilog in the flux systemd instance #6662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can we run prolog/epilog/housekeeping in the flux systemd instance? #6619

can we run prolog/epilog/housekeeping in the flux systemd instance? #6619

garlick commented Feb 10, 2025

garlick commented Feb 18, 2025

garlick commented Feb 19, 2025

garlick commented Feb 19, 2025 •

edited

Loading

garlick commented Feb 19, 2025 •

edited

Loading

can we run prolog/epilog/housekeeping in the flux systemd instance? #6619

can we run prolog/epilog/housekeeping in the flux systemd instance? #6619

Comments

garlick commented Feb 10, 2025

garlick commented Feb 18, 2025

garlick commented Feb 19, 2025

garlick commented Feb 19, 2025 • edited Loading

garlick commented Feb 19, 2025 • edited Loading

garlick commented Feb 19, 2025 •

edited

Loading

garlick commented Feb 19, 2025 •

edited

Loading