Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport the pause feature for MachineHealthChecks #827

Merged
merged 4 commits into from
Jul 22, 2021

Conversation

slintes
Copy link
Member

@slintes slintes commented Jul 8, 2021

We'd like to backport the pause feature for MachineHealthChecks from upstream SIG Cluster API,
see https://github.com/kubernetes-sigs/cluster-api/blob/37c862b5c9256d7ae65cd8ee8ddd7a103d51fe90/controllers/machinehealthcheck_controller.go#L134


```yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
name: example
namespace: openshift-machine-api
annotations:
cluster.x-k8s.io/paused": "clusterUpgrading"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about what values are good for the annotation. It's not being checked, but we might want to utilize it to sync between multiple parties who want to set this annotation?

@dhellmann
Copy link
Contributor

This is a pretty subtle change to a doc to imply a new feature. Is there more to cover about testing or other aspects of the work that might be more clearly described in a new document?

@JoelSpeed
Copy link
Contributor

I don't know that we need a new document, but certainly a new paragraph within the implementation details would be good to explain the pause feature

### Pause Annotation

To allow ... to achieve ... . We will implement ...

We do not care about the value, but the value can be used to ... 

@slintes
Copy link
Member Author

slintes commented Jul 8, 2021

fair point, good suggestion @JoelSpeed, thanks

@dhellmann would that work for you?

@dhellmann
Copy link
Contributor

Sure, adding more details to the implementation section (and any others that might need to be expanded) here addresses my concerns.

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
slintes added a commit to slintes/machine-api-operator that referenced this pull request Jul 12, 2021
See openshift/enhancements#827

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@JoelSpeed
Copy link
Contributor

Do we know upstream if the convention is to treat the annotation value as a singleton or a list? I could imagine it being a list (like finalizers) where each party is expected to add and remove its own annotation from the list, would be good to see if there's a note upstream about this behaviour so we can ensure we mimic that when we build controllers that will pause MHCs. That behaviour should be briefly identified here IMO

@slintes
Copy link
Member Author

slintes commented Jul 12, 2021

Unfortunately annotations are a map (while finalizers are a string array), so you can't compare them directly. Will ask upstream what their idea for the value is, and how to sync between multiple users.

@JoelSpeed
Copy link
Contributor

Unfortunately annotations are a map (while finalizers are a string array), so you can't compare them directly

My thought was more the value itself could be a list, eg:

annotations:
  cluster-x.k8s.io/paused: "controller-1,controller-2,controller-3"

I don't know if this kind of thing is supported upstream or not, but I suspect they've documented the appropriate behaviour here

slintes added a commit to slintes/machine-api-operator that referenced this pull request Jul 12, 2021
See openshift/enhancements#827

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@slintes
Copy link
Member Author

slintes commented Jul 12, 2021

I asked on the cluster API Slack.
Today the value is meant to be empty.
The list of users for the annotation might be interesting, although it would require that patching is always strongly consistent, which can cause a lot of extra reconciliations to our controllers.

I'd suggest to also just leave the value empty for now, and handle a user list in a potential follow up. Does that make sense?

- removed suggested usage of the annotation value, to be aligned with Cluster API
- added remark of the annotation key (be aligned with upstream)
- improved reasoning (no remediation without the need to delete MHCs)

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
in contrast to just not starting remediation

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@JoelSpeed
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2021
@JoelSpeed
Copy link
Contributor

/hold for now, just want to double check with the rest of the Cluster Infra team that they're happy with this

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2021
@JoelSpeed
Copy link
Contributor

/hold cancel
/lgtm

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@openshift-merge-robot openshift-merge-robot merged commit 941286b into openshift:master Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants