Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support "pause" of kdcluster #616

Open
joel-bluedata opened this issue Jun 8, 2022 · 3 comments
Open

support "pause" of kdcluster #616

joel-bluedata opened this issue Jun 8, 2022 · 3 comments

Comments

@joel-bluedata
Copy link
Member

We should support a boolean "switch" in the kdcluster spec that can be used to pause/stop the kdcluster. The effect of stopping would be:

  • all role statefulsets scaled down to 0, regardless of what the kdapp says is legal
  • member PVCs are NOT deleted

And the effect of restarting would be:

  • all roles scaled back up to their spec'd statefulset count

While stopped, some things to consider:

  • How should this affect the reported member states, member rollup status, and/or overall kdcluster state?
  • Having 0-size statefulsets while members still exist will violate some assumptions in the code so we'll need to handle that with care.
  • We will also need to have some extra validation and/or behavior on kdcluster spec edits. For example it would simplify things to disallow changes in role member counts while stopped.

Of course if none of the roles in a kdcluster use PVCs, this feature isn't terribly useful ... you could just delete the kdcluster and then re-create it later. But there's no reason we should particularly try to block using this feature on such kdclusters.

@joel-bluedata
Copy link
Member Author

Other considerations:

  • Should we allow the kdcluster to be initially created in a paused state? What about if it is in the middle of some other reconfiguration? It would be simplifying if we could say that a pause is not allowed until/unless the kdcluster is in a configured state. (This is similar to the discussion about when live upgrades are allowed.)
  • Do we need a lifecycle event for this in the app startscripts? Basically "I just woke you back up". Maybe not since it should be effectively the same as any pod restart.
  • Speaking of which, how concerned do we need to be about having a graceful/coordinated pause? Some apps could get quite upset if all their pods go down at once (and they may have specific scheduling affinities to try to avoid this). Does an app need to declare whether or not it is pause-able?

@joel-bluedata
Copy link
Member Author

I think we can actually lift a lot of the ideas/decisions from the live-upgrade feature to answer those questions. I.e., only allow pause for a stable configured kdcluster and don't allow other changes while paused; let kdapps declare if they are pausable, but for old kdapps (that haven't declared this) they should be able to be edited to "pausable=true" even if in use.

Not sure about the lifecycle event but I'm inclined to not have that for now.

@joel-bluedata joel-bluedata modified the milestones: on deck, 0.11.0 Jun 15, 2022
@joel-bluedata
Copy link
Member Author

Going to see if I can look at this for a near-term release like 0.11.0.

@joel-bluedata joel-bluedata self-assigned this Jun 15, 2022
@joel-bluedata joel-bluedata modified the milestones: 0.11.0, 0.12.0 Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant