Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shrinkable flux instance #6645

Open
10 tasks
garlick opened this issue Feb 14, 2025 · 0 comments
Open
10 tasks

shrinkable flux instance #6645

garlick opened this issue Feb 14, 2025 · 0 comments

Comments

@garlick
Copy link
Member

garlick commented Feb 14, 2025

Problem: Flux instances can mark nodes offline but cannot permanently remove (and free) them.

This issue is for the "shrink" operation, the easier half of "grow and shrink" as described in the original flux design document.

The initial solution could be constrained to

  • operate at execution target (broker rank) granularity
  • only support the removal of non-critical nodes

Here's a quick pass at a work breakdown

  • provide a mechanism to request removal of set of execution targets from the flux instance
  • post a (TBD) resource event to modify the resource set used in core
  • make sure core tools reflect the current resource set
  • send a (TBD) acquire shrink response to the scheduler
  • modify schedulers to handle this response
  • implement partial release of discarded nodes
  • auto shrink on offline? flux assumes down nodes can come back, but this is only true for bootstrap from configuration #6641
  • mechanism to shed nodes as they become idle (maybe shrink option to flux queue drain|idle
  • rename the current fixed size attribute to tbon.universe-size or similar to nix the implication that it's a node count
  • provide a new execution-targets or similar attribute that's an idset of valid ranks

See also:

Edit: I removed the reference to "malleable jobs" and the TODO on a general way to shrink jobs as those items may get us off track of fulfilling this issue, which seems a bit narrower to me.

@garlick garlick changed the title malleability of flux instance: implement shrink shrinkable flux instance Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant