Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: run housekeeping, prolog, epilog in the flux systemd instance #6662

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

garlick
Copy link
Member

@garlick garlick commented Feb 24, 2025

as discussed in #6619, the proposed sdmon service (#6616) could be simplified if it didn't have to monitor the system systemd instance for prolog/epilog/housekeeping, in addition to monitoring the flux one for jobs.

This moves prolog/epilog/housekeeping to the flux systemd instance. The flux-run-prolog type scripts now just exec the user scripts directly to give the admins time to convert the IMP run config to run the user scripts instead of the wrappers.

I'm marking this as a WIP pending

I've tested this manually on my test cluster, including signaling, but haven't really tried to write tests yet.

Problem: the scripts flux-run-{epilog,prolog,housekeeping} and their
templated systemd unit files are no longer needed.

The job manager will now start the site-provided scripts directly
with the IMP in a transient systemd unit in the Flux systemd instance.

Remove the systemd unit files.

Instead of removing the run scripts and forcing sites to update the IMP
config to run the site scripts immediately, retain the run scripts for now
and modify them to exec the site scripts (making them a no-op).  We can
remove the run scripts later.
Problem: a housekeeping comment references 'flux imp exec' when
flux 'flux imp run' was intended.

Fix the comment.
Problem: when housekeeping (re-)parses its configuration, it recreates
the subprocess command template within the config parsing function,
which will become unwieldy given planned changes.

Without altering functionality, move the command template creation
to a separate function.  Also, drop the unused hk->imp_path member.
Problem: we'd like to start housekeeping in the flux systemd instance
but it currently hard codes the "rexec" subprocess exec service.

If exec.service is set in configuration, use that for housekeeping too.
If using the IMP with sdexec, set command options for proxy signaling.

Change the bulk exec name from "housekeeping" to "flux-housekeeping".
This becomes the unit name under sdexec, which must begin with "flux-"
to get the required IMP signal forwarding behavior
Problem: perilog_procdesc.uses_imp is unused

Drop unused struct member.
Problem: we'd like to start prolog/epilog in the flux systemd instance
but it currently hard codes the "rexec" subprocess exec service.

If exec.service is set in configuration, use that for perilog too.
If using the IMP with sdexec, set command options for proxy signaling.

Change the bulk exec name from prolog/epilog to flux-prolog/flux-epilog".
This becomes the unit name under sdexec, which must begin with "flux-"
to get the required IMP signal forwarding behavior

Although perilog never sends SIGKILL after it was disabled in flux-framework#6330,
set a kill_signal to SIGUSR1 or SIGKILL in case we want to revert
that change in the future.
Copy link

codecov bot commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 62.90323% with 23 lines in your changes missing coverage. Please review.

Project coverage is 83.88%. Comparing base (381bd09) to head (51fa4c9).

Files with missing lines Patch % Lines
src/modules/job-manager/housekeeping.c 68.18% 14 Missing ⚠️
src/modules/job-manager/plugins/perilog.c 50.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6662      +/-   ##
==========================================
- Coverage   83.88%   83.88%   -0.01%     
==========================================
  Files         533      533              
  Lines       88536    88576      +40     
==========================================
+ Hits        74266    74299      +33     
- Misses      14270    14277       +7     
Files with missing lines Coverage Δ
src/modules/job-manager/plugins/perilog.c 81.62% <50.00%> (-0.92%) ⬇️
src/modules/job-manager/housekeeping.c 81.19% <68.18%> (-1.34%) ⬇️

... and 8 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant