Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after an emergency restart, flux doesn't know about user processes that are still running #6590

Open
garlick opened this issue Jan 30, 2025 · 1 comment · May be fixed by #6616
Open

after an emergency restart, flux doesn't know about user processes that are still running #6590

garlick opened this issue Jan 30, 2025 · 1 comment · May be fixed by #6616

Comments

@garlick
Copy link
Member

garlick commented Jan 30, 2025

Problem: If flux is not shut down cleanly, running jobs can escape its control.

When flux restarts, it should detect any running jobs, prolog, epilog, or housekeeping systemd units, and not release those execution targets to the scheduler until they are clean.

Perhaps this could be handled by the resource module's "monitoring" subsystem.

@garlick
Copy link
Member Author

garlick commented Feb 1, 2025

It might be fairly easy to implement this by adding a broker.idle group that sdexec joins on each rank when it knows there are no systemd units running there.

Then maybe the resource module on rank 0 could require execution targets to join that group before they can be declared initially "up" to the scheduler. (Perhaps after that the group could just be ignored). And some tooling could be added to query those.

It would be nice if we could also query the root systemd instance for prolog/epilog/housekeeping units. I wonder if the sdbus module could be modified so it could be loaded twice, with an instance assigned to root and another assigned to user? (I vaguely remember that a regular user might not be able to authenticate to the root dbus but systemctl seems to be able to even though its not setuid.)

A quick strace of systemctl list-units run as a regular user shows it connecting to

connect(3, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 30) = 0

garlick added a commit to garlick/flux-core that referenced this issue Feb 6, 2025
Problem: nodes are not checked for untracked running work when a
Flux instance starts up.

This might happen, for example, if
- job-exec deems job shell(s) unkillable
- housekeeping/prolog/epilog gets stuck on a hung file system

When systemd is enabled, the new sdmon module joins a 'sdmon.idle'
on startup.  If there are any running flux units, this is delayed until
those units are no longer running.

Change the resource module so that it monitors sdmon.idle instead of
broker.online when systemd is enabled.  This will withhold "busy" nodes
from the scheduler until they become idle.

Fixes flux-framework#6590
garlick added a commit to garlick/flux-core that referenced this issue Feb 7, 2025
Problem: nodes are not checked for untracked running work when a
Flux instance starts up.

This might happen, for example, if
- job-exec deems job shell(s) unkillable
- housekeeping/prolog/epilog gets stuck on a hung file system

When systemd is enabled, the new sdmon module joins the 'sdmon.idle'
broker group on startup.  However, if there are any running flux units,
this is delayed until those units are no longer running.

Change the resource module so that it monitors sdmon.idle instead of
broker.online when systemd is enabled.  This will withhold "busy" nodes
from the scheduler until they become idle.

Fixes flux-framework#6590
garlick added a commit to garlick/flux-core that referenced this issue Feb 7, 2025
Problem: nodes are not checked for untracked running work when a
Flux instance starts up.

This might happen, for example, if
- job-exec deems job shell(s) unkillable
- housekeeping/prolog/epilog gets stuck on a hung file system

When systemd is enabled, the new sdmon module joins the 'sdmon.idle'
broker group on startup.  However, if there are any running flux units,
this is delayed until those units are no longer running.

Change the resource module so that it monitors sdmon.idle instead of
broker.online when systemd is enabled.  This will withhold "busy" nodes
from the scheduler until they become idle.

Fixes flux-framework#6590
garlick added a commit to garlick/flux-core that referenced this issue Feb 7, 2025
Problem: nodes are not checked for untracked running work when a
Flux instance starts up.

This might happen, for example, if
- job-exec deems job shell(s) unkillable
- housekeeping/prolog/epilog gets stuck on a hung file system
- the broker exits without proper shutdown

When systemd is enabled, the new sdmon module joins the 'sdmon.online'
broker group on startup.  However, if there are any running flux units,
this is delayed until those units are no longer running.

Change the resource module so that it monitors sdmon.online instead of
broker.online when systemd is enabled.  This will withhold "busy" nodes
from the scheduler until they become idle.

Fixes flux-framework#6590
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant